當AI遇上股票
Table of Contents
以下範例均假設於Google Colab上執行,若於本機IDE(如PyCharm、Spyder)上執行,則部份指令執行方式要略做修正。
1. CNN-1: 以AI預測股價-隔日漲跌
1.1. 安裝相關套件
1: pip install yfinance
1.2. 下載股價資訊
1: import yfinance as yf 2: 3: df = yf.Ticker('2330.TW').history(period='10y') 4: print(type(df))
1.2.1. 查看下載的資料集
1: df 2: #print(df[:5])
1.2.2. 取出需要的收盤價
從陣列中讀出收盤價
1: data = df.filter(['Close']) 2: data
1.3. 觀察原始資料/日K圖
1: import matplotlib.pyplot as plt 2: plt.clf() 3: plt.plot(data.Close) 4: plt.show()
1.4. 將資料標準化
1: from sklearn.preprocessing import MinMaxScaler 2: scaler = MinMaxScaler(feature_range=(0, 1)) 3: sc_data = scaler.fit_transform(data.values) 4: 5: sc_data #變成numpy array
1.5. 建立、分割資料
1.5.1. 建立資料集及標籤
1: import numpy as np 2: 3: # 以前N天的股價來預測未來股價 4: previousNDays = 10 5: x_data, y_data = [], [] 6: for i in range(len(sc_data) - previousNDays): 7: x = sc_data[i:i+previousNDays] 8: y = sc_data[i+previousNDays] 9: x_data.append(x) 10: y_data.append(y) 11: #因為等一下要送進tensorflow,所以先轉成numpy的陣列格式 12: x_data, y_data = np.array(x_data), np.array(y_data) 13: 14: print(x_data.shape) 15: print(y_data.shape)
1.5.2. 分割訓練集與測試集
1: ratio = 0.8 2: train_size = round(len(x_data) * ratio) 3: print(train_size) 4: # 第0筆到第train_size-1筆的資料分割為訓練集 5: x_train, y_train = x_data[:train_size], y_data[:train_size] 6: # 第train_size筆到最後一筆的資料分割為測試集 7: x_test, y_test = x_data[train_size:], y_data[train_size:] 8: #from sklearn.model_selection import train_test_split 9: #x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.2) 10: print(x_train.shape) 11: print(y_train.shape) 12: print(x_test.shape) 13: print(y_test.shape)
1.6. 建立、編譯、訓練模型
1.6.1. 建立模型
1: # CNN模型 2: import tensorflow as tf 3: #建構CNN模型 4: model = tf.keras.Sequential() 5: #輸入層/卷積層 6: model.add(tf.keras.layers.Conv2D(filters=512, kernel_size=1, activation='relu', input_shape=(previousNDays,1,1))) 7: #攤平 8: model.add(tf.keras.layers.Flatten()) 9: #全連接層/輸出層 10: model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
1: model.summary()
1.6.2. 編譯模型
1: model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
1.6.3. 訓練模型
1: model.fit(x_train, y_train, 2: validation_split=0.2, 3: batch_size=200, epochs=20)
1.7. 性能測試
1.7.1. loss
1: score = model.evaluate(x_test, y_test) 2: print('loss:', score[0])
1.7.2. predict
1: predict = model.predict(x_test) 2: predict = scaler.inverse_transform(predict) 3: predict = np.reshape(predict, (predict.size,)) 4: ans = scaler.inverse_transform(y_test) 5: ans = np.reshape(ans, (ans.size,)) 6: print(predict[:3]) 7: print(ans[:3])
1.7.3. plot
1: plt.plot(predict) 2: plt.plot(ans)nnb 3: plt.show()
1.8. 能怎麼胡搞
- 多讀些原始資料
- 用更多特徵值來預測
- 用更多/更少天數來預測
- 變更模型架構
- 變更訓練集:測試集比例
- 增加epoch
2. CNN-2: 以AI預測股價-隔日漲跌
2.1. 安裝相關套件
1: pip install yfinance
2.2. 下載股價資訊
1: import yfinance as yf 2: 3: df = yf.Ticker('2330.TW').history(period='10y') 4: print(type(df))
2.2.1. 查看下載的資料集
1: df 2: #print(df[:5])
2.2.2. 取出需要的特徵值
此次將成交量納入考慮
1: data = df.filter(['Close', 'Volume']) 2: data
2.3. 觀察原始資料/日K圖
1: import matplotlib.pyplot as plt 2: plt.clf() 3: plt.plot(data.Close) 4: plt.show() 5: plt.clf() 6: plt.plot(data.Volume) 7: plt.show()
2.4. 將資料標準化
1: from sklearn.preprocessing import MinMaxScaler 2: scalerX = MinMaxScaler(feature_range=(0, 1)) 3: scalerY = MinMaxScaler(feature_range=(0, 1)) 4: all_x = data[['Volume', 'Close']] 5: all_y = data['Close'] 6: print(all_x.shape) 7: print(all_y.shape) 8: scal_all_x = scalerX.fit_transform(all_x.values) 9: scal_all_y = scalerY.fit_transform(all_y.values.reshape(-1, 1))
2.5. 建立、分割資料
2.5.1. 建立資料集及標籤
1: import numpy as np 2: 3: # 以前N天的股價來預測未來股價 4: previousNDays = 10 5: x_data, y_data = [], [] 6: for i in range(len(scal_all_x) - previousNDays): 7: x = scal_all_x[i:i+previousNDays] 8: y = scal_all_y[i+previousNDays] 9: x_data.append(x) 10: y_data.append(y) 11: #因為等一下要送進tensorflow 12: x_data, y_data = np.array(x_data), np.array(y_data) 13: 14: print(x_data.shape) 15: print(y_data.shape)
2.5.2. 分割訓練集與測試集
1: ratio = 0.8 2: train_size = round(len(x_data) * ratio) 3: print(train_size) 4: x_train, y_train = x_data[:train_size], y_data[:train_size] 5: x_test, y_test = x_data[train_size:], y_data[train_size:] 6: 7: print(x_train.shape) 8: print(y_train.shape) 9: print(x_test.shape) 10: print(y_test.shape)
2.6. 建立、編譯、訓練模型
2.6.1. 建立模型
1: # CNN模型 2: import tensorflow as tf 3: #建構CNN模型 4: model = tf.keras.Sequential() 5: #輸入層/卷積層 6: model.add(tf.keras.layers.Conv2D(filters=512, kernel_size=1, activation='relu', input_shape=(previousNDays,2,1))) 7: #輸入層/卷積層 8: model.add(tf.keras.layers.Conv2D(filters=512, kernel_size=1, activation='relu')) 9: #攤平 10: model.add(tf.keras.layers.Flatten()) 11: #全連接層/輸出層 12: model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
1: model.summary()
2.6.2. 編譯模型
1: model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
2.6.3. 訓練模型
1: model.fit(x_train, y_train, 2: validation_split=0.2, 3: batch_size=200, epochs=20)
2.7. 性能測試
2.7.1. loss
1: score = model.evaluate(x_test, y_test) 2: print('loss:', score[0])
2.7.2. predict
1: predict = model.predict(x_test) 2: predict = scaler.inverse_transform(predict) 3: predict = np.reshape(predict, (predict.size,)) 4: ans = scaler.inverse_transform(y_test) 5: ans = np.reshape(ans, (ans.size,)) 6: print(predict[:3]) 7: print(ans[:3])
2.7.3. plot
1: plt.plot(predict) 2: plt.plot(ans) 3: plt.show()
2.8. 能怎麼胡搞
- 多讀些原始資料
- 用更多特徵值來預測
- 用更多/更少天數來預測
- 變更模型架構
- 變更訓練集:測試集比例
- 增加epoch
3. CNN-3 以AI預測股價-後5日漲跌
為了讓大家都能交出作業,這裡提供一個利用CNN模型的程式參考範例
- 程式每抓取股票的兩個特徵值
- 每次預測後5天的股價
請各組自行修改測試
1: import yfinance as yf 2: df = yf.Ticker('3260.TWO').history(period='10y') 3: #挑兩個特徵值 4: data = df.filter(['Open', 'Close']) 5: 6: from sklearn.preprocessing import MinMaxScaler 7: #資料標法化 8: scaler = MinMaxScaler(feature_range=(0, 1)) 9: sc_data = scaler.fit_transform(data.values) 10: 11: import numpy as np 12: 13: featureDays = 10 #拿來預測的天數 14: days = 5 #要生成的天數 15: 16: x_data, y_data = [], [] 17: for i in range(len(sc_data) - featureDays - days + 1): 18: x = sc_data[i:i+featureDays, :] 19: y = sc_data[i+featureDays: i+featureDays+days, 1] 20: x_data.append(x) 21: y_data.append(y) 22: 23: x_data = np.array(x_data).reshape(-1, featureDays, 2, 1) # 调整形状以适应Conv2D 24: y_data = np.array(y_data) 25: 26: #訓練集、測試集比例 27: ratio = 0.8 28: train_size = round(len(x_data) * ratio) 29: print(train_size) 30: x_train, y_train = x_data[:train_size], y_data[:train_size] 31: x_test, y_test = x_data[train_size:], y_data[train_size:] 32: 33: import tensorflow as tf 34: #建構CNN模型 35: model = tf.keras.Sequential() 36: #輸入層/卷積層 37: model.add(tf.keras.layers.Conv2D(filters=512, kernel_size=2, activation='relu', input_shape=(featureDays,2,1))) 38: #攤平 39: model.add(tf.keras.layers.Flatten()) 40: #全連接層/輸出層 41: model.add(tf.keras.layers.Dense(days, activation='sigmoid')) 42: 43: model.compile(loss='huber_loss', optimizer='RMSprop', metrics=['accuracy']) 44: history = model.fit(x_train, y_train, batch_size=100, epochs=20, validation_data=(x_test, y_test)) 45: 46: #評估、預測 47: score = model.evaluate(x_test, y_test) 48: print('loss:', score[0]) 49: print('accuracy:', score[1]) 50: 51: predict = model.predict(x_test) 52: print(predict.shape) 53: print(predict) 54: 55: #還原scale 56: predict_expanded = np.zeros((predict.shape[0], predict.shape[1], 2)) 57: predict_expanded[:, :, 1] = predict 58: 59: predicted_prices = np.zeros((predict.shape[0], predict.shape[1])) 60: for i in range(predict.shape[1]): 61: predicted_prices[:, i] = scaler.inverse_transform(predict_expanded[:, i])[:, 1] 62: 63: print(predicted_prices.shape) 64: print(predicted_prices)
4. LSTM: 以AI預測股價-後5日漲跌
4.1. 參考程式碼
1: import yfinance as yf 2: from sklearn.preprocessing import MinMaxScaler 3: import numpy as np 4: from tensorflow.keras.models import Sequential 5: from tensorflow.keras.layers import LSTM, Dense 6: import matplotlib.pyplot as plt 7: 8: df = yf.Ticker('3260.TWO').history(period='10y') 9: 10: #挑兩個特徵值 11: data = df.filter(['Open', 'Close']) 12: sc_data = data.values 13: 14: featureDays = 10 #拿來預測的天數 15: days = 5 #要生成的天數 16: # 17: x_data, y_data = [], [] 18: for i in range(len(sc_data) - featureDays - days + 1): 19: x = sc_data[i:(i + featureDays), :] 20: y = sc_data[(i + featureDays):(i + featureDays + days), 1] 21: x_data.append(x) 22: y_data.append(y) 23: 24: x_data, y_data = np.array(x_data), np.array(y_data) 25: 26: # 訓練集與測試集 27: train_size = int(len(x_data) * 0.8) 28: x_train, x_test = x_data[:train_size], x_data[train_size:] 29: y_train, y_test = y_data[:train_size], y_data[train_size:] 30: 31: 32: model = Sequential() 33: model.add(LSTM(50, return_sequences=True, input_shape=(featureDays, 2))) 34: model.add(LSTM(50, return_sequences=False)) 35: model.add(Dense(25)) 36: model.add(Dense(days)) 37: 38: model.compile(optimizer='adam', loss='mean_squared_error') 39: 40: history = model.fit(x_train, y_train, batch_size=32, epochs=20, validation_data=(x_test, y_test)) 41: 42: est_loss = model.evaluate(x_test, y_test) 43: print('Test loss:', test_loss) 44: 45: predictions = model.predict(x_test) 46: 47: print(predictions) 48: 49: # 预测与实际价格比较 50: plt.figure(figsize=(10, 6)) 51: plt.plot(np.array(y_test).flatten(), label='Actual Prices') 52: plt.plot(np.array(predicted_prices).flatten(), label='Predicted Prices') 53: plt.title('Stock Price Prediction') 54: plt.xlabel('Time') 55: plt.ylabel('Price') 56: plt.legend() 57: plt.show()
4.2. 能怎麼胡搞
- 多讀些原始資料
- 用更多特徵值來預測
- 用更多/更少天數來預測
- 變更模型架構
- 變更訓練集:測試集比例
- 增加epoch