ABBA-LSTM
ABBA-LSTM copied to clipboard
Training on the test set
Hello, I found a few inconsistencies in the paper and the code. Let me know what you think about it.
In the proposed method, ABBA is proposed as a preprocessing of the time series. There is a clustering part in the ABBA preprocessing. As the task consists of predicting the future, the clustering should take into account those future steps: as we still do not know those updates, they should not be used during the preprocessing.
There is a huge problem in defining the training and test sets as the ABBA preprocessing changes the sampling of the time series. If we first split the dataset, the ABBA preprocessing is likely to have outliers directly at the end of the train set and the beginning of the test set. Since those samples are extremely important, they should not be neglected. If we first do the ABBA preprocessing, we have the problem stated in the first paragraph.
Because of this logical problem, there is no way to code something that works logically. So to avoid that, in the code, it seems like you used the whole time series for training (even the test set).
train = ts[:-fcast_len]
test = ts[-fcast_len:]
# Build ABBA constructor
abba = ABBA(tol=0.05, max_k = 10, verbose=0)
# LSTM model with ABBA
t0 = time.time()
f = forecaster(ts, model=VanillaLSTM_keras(lag=lag), abba=abba)
f.train(patience=patience, max_epoch=10000+patience)
forecast1 = f.forecast(len(test)).tolist()
Moreover, in f.forecast(len(test)).tolist()
, the number of data samples is supposed to correspond to the number of time steps in the original time series while in your function, you predict len(test)
ABBA symbols...
I hope it helps.