當AI遇上股票

Table of Contents

Hits

以下範例均假設於Google Colab上執行,若於本機IDE(如PyCharm、Spyder)上執行,則部份指令執行方式要略做修正。

1. CNN-1: 以AI預測股價-隔日漲跌

1.1. 安裝相關套件

1: pip install yfinance

1.2. 下載股價資訊

1: import yfinance as yf
2: 
3: df = yf.Ticker('2330.TW').history(period='10y')
4: print(type(df))

1.2.1. 查看下載的資料集

1: df
2: #print(df[:5])

1.2.2. 取出需要的收盤價

從陣列中讀出收盤價

1: data = df.filter(['Close'])
2: data

1.3. 觀察原始資料/日K圖

1: import matplotlib.pyplot as plt
2: plt.clf()
3: plt.plot(data.Close)
4: plt.show()

1.4. 將資料標準化

1: from sklearn.preprocessing import MinMaxScaler
2: scaler = MinMaxScaler(feature_range=(0, 1))
3: sc_data = scaler.fit_transform(data.values)
4: 
5: sc_data #變成numpy array

1.5. 建立、分割資料

1.5.1. 建立資料集及標籤

 1: import numpy as np
 2: 
 3: # 以前N天的股價來預測未來股價
 4: previousNDays = 10
 5: x_data, y_data = [], []
 6: for i in range(len(sc_data) - previousNDays):
 7:   x = sc_data[i:i+previousNDays]
 8:   y = sc_data[i+previousNDays]
 9:   x_data.append(x)
10:   y_data.append(y)
11: #因為等一下要送進tensorflow,所以先轉成numpy的陣列格式
12: x_data, y_data = np.array(x_data), np.array(y_data)
13: 
14: print(x_data.shape)
15: print(y_data.shape)

1.5.2. 分割訓練集與測試集

 1: ratio = 0.8
 2: train_size = round(len(x_data) * ratio)
 3: print(train_size)
 4: # 第0筆到第train_size-1筆的資料分割為訓練集
 5: x_train, y_train = x_data[:train_size], y_data[:train_size]
 6: # 第train_size筆到最後一筆的資料分割為測試集
 7: x_test, y_test = x_data[train_size:], y_data[train_size:]
 8: #from sklearn.model_selection import train_test_split
 9: #x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.2)
10: print(x_train.shape)
11: print(y_train.shape)
12: print(x_test.shape)
13: print(y_test.shape)

1.6. 建立、編譯、訓練模型

1.6.1. 建立模型

 1: # CNN模型
 2: import tensorflow as tf
 3: #建構CNN模型
 4: model = tf.keras.Sequential()
 5: #輸入層/卷積層
 6: model.add(tf.keras.layers.Conv2D(filters=512, kernel_size=1, activation='relu', input_shape=(previousNDays,1,1)))
 7: #攤平
 8: model.add(tf.keras.layers.Flatten())
 9: #全連接層/輸出層
10: model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
1: model.summary()

1.6.2. 編譯模型

1: model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])

1.6.3. 訓練模型

1: model.fit(x_train, y_train,
2:           validation_split=0.2,
3:           batch_size=200, epochs=20)

1.7. 性能測試

1.7.1. loss

1: score = model.evaluate(x_test, y_test)
2: print('loss:', score[0])

1.7.2. predict

1: predict = model.predict(x_test)
2: predict = scaler.inverse_transform(predict)
3: predict = np.reshape(predict, (predict.size,))
4: ans = scaler.inverse_transform(y_test)
5: ans = np.reshape(ans, (ans.size,))
6: print(predict[:3])
7: print(ans[:3])

1.7.3. plot

1: plt.plot(predict)
2: plt.plot(ans)nnb
3: plt.show()

1.8. 能怎麼胡搞

  • 多讀些原始資料
  • 用更多特徵值來預測
  • 用更多/更少天數來預測
  • 變更模型架構
  • 變更訓練集:測試集比例
  • 增加epoch

2. CNN-2: 以AI預測股價-隔日漲跌

2.1. 安裝相關套件

1: pip install yfinance

2.2. 下載股價資訊

1: import yfinance as yf
2: 
3: df = yf.Ticker('2330.TW').history(period='10y')
4: print(type(df))

2.2.1. 查看下載的資料集

1: df
2: #print(df[:5])

2.2.2. 取出需要的特徵值

此次將成交量納入考慮

1: data = df.filter(['Close', 'Volume'])
2: data

2.3. 觀察原始資料/日K圖

1: import matplotlib.pyplot as plt
2: plt.clf()
3: plt.plot(data.Close)
4: plt.show()
5: plt.clf()
6: plt.plot(data.Volume)
7: plt.show()

2.4. 將資料標準化

1: from sklearn.preprocessing import MinMaxScaler
2: scalerX = MinMaxScaler(feature_range=(0, 1))
3: scalerY = MinMaxScaler(feature_range=(0, 1))
4: all_x = data[['Volume', 'Close']]
5: all_y = data['Close']
6: print(all_x.shape)
7: print(all_y.shape)
8: scal_all_x = scalerX.fit_transform(all_x.values)
9: scal_all_y = scalerY.fit_transform(all_y.values.reshape(-1, 1))

2.5. 建立、分割資料

2.5.1. 建立資料集及標籤

 1: import numpy as np
 2: 
 3: # 以前N天的股價來預測未來股價
 4: previousNDays = 10
 5: x_data, y_data = [], []
 6: for i in range(len(scal_all_x) - previousNDays):
 7:   x = scal_all_x[i:i+previousNDays]
 8:   y = scal_all_y[i+previousNDays]
 9:   x_data.append(x)
10:   y_data.append(y)
11: #因為等一下要送進tensorflow
12: x_data, y_data = np.array(x_data), np.array(y_data)
13: 
14: print(x_data.shape)
15: print(y_data.shape)

2.5.2. 分割訓練集與測試集

 1: ratio = 0.8
 2: train_size = round(len(x_data) * ratio)
 3: print(train_size)
 4: x_train, y_train = x_data[:train_size], y_data[:train_size]
 5: x_test, y_test = x_data[train_size:], y_data[train_size:]
 6: 
 7: print(x_train.shape)
 8: print(y_train.shape)
 9: print(x_test.shape)
10: print(y_test.shape)

2.6. 建立、編譯、訓練模型

2.6.1. 建立模型

 1: # CNN模型
 2: import tensorflow as tf
 3: #建構CNN模型
 4: model = tf.keras.Sequential()
 5: #輸入層/卷積層
 6: model.add(tf.keras.layers.Conv2D(filters=512, kernel_size=1, activation='relu', input_shape=(previousNDays,2,1)))
 7: #輸入層/卷積層
 8: model.add(tf.keras.layers.Conv2D(filters=512, kernel_size=1, activation='relu'))
 9: #攤平
10: model.add(tf.keras.layers.Flatten())
11: #全連接層/輸出層
12: model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
1: model.summary()

2.6.2. 編譯模型

1: model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])

2.6.3. 訓練模型

1: model.fit(x_train, y_train,
2:           validation_split=0.2,
3:           batch_size=200, epochs=20)

2.7. 性能測試

2.7.1. loss

1: score = model.evaluate(x_test, y_test)
2: print('loss:', score[0])

2.7.2. predict

1: predict = model.predict(x_test)
2: predict = scaler.inverse_transform(predict)
3: predict = np.reshape(predict, (predict.size,))
4: ans = scaler.inverse_transform(y_test)
5: ans = np.reshape(ans, (ans.size,))
6: print(predict[:3])
7: print(ans[:3])

2.7.3. plot

1: plt.plot(predict)
2: plt.plot(ans)
3: plt.show()

2.8. 能怎麼胡搞

  • 多讀些原始資料
  • 用更多特徵值來預測
  • 用更多/更少天數來預測
  • 變更模型架構
  • 變更訓練集:測試集比例
  • 增加epoch

3. CNN-3 以AI預測股價-後5日漲跌

為了讓大家都能交出作業,這裡提供一個利用CNN模型的程式參考範例

  • 程式每抓取股票的兩個特徵值
  • 每次預測後5天的股價

請各組自行修改測試

 1: import yfinance as yf
 2: df = yf.Ticker('3260.TWO').history(period='10y')
 3: #挑兩個特徵值
 4: data = df.filter(['Open', 'Close'])
 5: 
 6: from sklearn.preprocessing import MinMaxScaler
 7: #資料標法化
 8: scaler = MinMaxScaler(feature_range=(0, 1))
 9: sc_data = scaler.fit_transform(data.values)
10: 
11: import numpy as np
12: 
13: featureDays = 10 #拿來預測的天數
14: days = 5 #要生成的天數
15: 
16: x_data, y_data = [], []
17: for i in range(len(sc_data) - featureDays - days + 1):
18:   x = sc_data[i:i+featureDays, :]
19:   y = sc_data[i+featureDays: i+featureDays+days, 1]
20:   x_data.append(x)
21:   y_data.append(y)
22: 
23: x_data = np.array(x_data).reshape(-1, featureDays, 2, 1)  # 调整形状以适应Conv2D
24: y_data = np.array(y_data)
25: 
26: #訓練集、測試集比例
27: ratio = 0.8
28: train_size = round(len(x_data) * ratio)
29: print(train_size)
30: x_train, y_train = x_data[:train_size], y_data[:train_size]
31: x_test, y_test = x_data[train_size:], y_data[train_size:]
32: 
33: import tensorflow as tf
34: #建構CNN模型
35: model = tf.keras.Sequential()
36: #輸入層/卷積層
37: model.add(tf.keras.layers.Conv2D(filters=512, kernel_size=2, activation='relu', input_shape=(featureDays,2,1)))
38: #攤平
39: model.add(tf.keras.layers.Flatten())
40: #全連接層/輸出層
41: model.add(tf.keras.layers.Dense(days, activation='sigmoid'))
42: 
43: model.compile(loss='huber_loss', optimizer='RMSprop', metrics=['accuracy'])
44: history = model.fit(x_train, y_train, batch_size=100, epochs=20, validation_data=(x_test, y_test))
45: 
46: #評估、預測
47: score = model.evaluate(x_test, y_test)
48: print('loss:', score[0])
49: print('accuracy:', score[1])
50: 
51: predict = model.predict(x_test)
52: print(predict.shape)
53: print(predict)
54: 
55: #還原scale
56: predict_expanded = np.zeros((predict.shape[0], predict.shape[1], 2))
57: predict_expanded[:, :, 1] = predict
58: 
59: predicted_prices = np.zeros((predict.shape[0], predict.shape[1]))
60: for i in range(predict.shape[1]):
61:     predicted_prices[:, i] = scaler.inverse_transform(predict_expanded[:, i])[:, 1]
62: 
63: print(predicted_prices.shape)
64: print(predicted_prices)

4. LSTM: 以AI預測股價-後5日漲跌

4.1. 參考程式碼

 1: import yfinance as yf
 2: from sklearn.preprocessing import MinMaxScaler
 3: import numpy as np
 4: from tensorflow.keras.models import Sequential
 5: from tensorflow.keras.layers import LSTM, Dense
 6: import matplotlib.pyplot as plt
 7: 
 8: df = yf.Ticker('3260.TWO').history(period='10y')
 9: 
10: #挑兩個特徵值
11: data = df.filter(['Open', 'Close'])
12: sc_data = data.values
13: 
14: featureDays = 10 #拿來預測的天數
15: days = 5 #要生成的天數
16: #
17: x_data, y_data = [], []
18: for i in range(len(sc_data) - featureDays - days + 1):
19:     x = sc_data[i:(i + featureDays), :]
20:     y = sc_data[(i + featureDays):(i + featureDays + days), 1]
21:     x_data.append(x)
22:     y_data.append(y)
23: 
24: x_data, y_data = np.array(x_data), np.array(y_data)
25: 
26: # 訓練集與測試集
27: train_size = int(len(x_data) * 0.8)
28: x_train, x_test = x_data[:train_size], x_data[train_size:]
29: y_train, y_test = y_data[:train_size], y_data[train_size:]
30: 
31: 
32: model = Sequential()
33: model.add(LSTM(50, return_sequences=True, input_shape=(featureDays, 2)))
34: model.add(LSTM(50, return_sequences=False))
35: model.add(Dense(25))
36: model.add(Dense(days))
37: 
38: model.compile(optimizer='adam', loss='mean_squared_error')
39: 
40: history = model.fit(x_train, y_train, batch_size=32, epochs=20, validation_data=(x_test, y_test))
41: 
42: est_loss = model.evaluate(x_test, y_test)
43: print('Test loss:', test_loss)
44: 
45: predictions = model.predict(x_test)
46: 
47: print(predictions)
48: 
49: # 预测与实际价格比较
50: plt.figure(figsize=(10, 6))
51: plt.plot(np.array(y_test).flatten(), label='Actual Prices')
52: plt.plot(np.array(predicted_prices).flatten(), label='Predicted Prices')
53: plt.title('Stock Price Prediction')
54: plt.xlabel('Time')
55: plt.ylabel('Price')
56: plt.legend()
57: plt.show()

4.2. 能怎麼胡搞

  • 多讀些原始資料
  • 用更多特徵值來預測
  • 用更多/更少天數來預測
  • 變更模型架構
  • 變更訓練集:測試集比例
  • 增加epoch

Author: Yung-Chin Yen

Created: 2024-05-11 Sat 09:40