neuralforecast
neuralforecast copied to clipboard
How to incrementally retrain model with new data
Given streaming data, how do we retrain on new smaller batch of data on previously trained model.
Is this possible?
Hey @rnjv,
To save and retrain a model you can use core.save
and core.load
methods.
Here is a save and load tutorial.
Calling a core.fit
on the loaded model will retrain it incrementally.
Will core.fit and retraining by batches reduce the memory requirements, currently memory requirements looks heavy.
Hey @rnjv,
The memory requirements depend on the models.
If you are using Recurrent-based models like LSTM, TCN, GRU you can use the inference_input_size
parameter to reduce the memory usage during inference. What the parameter does is trimming the length of the time series at the time of the prediction.
Is doing something like this reliable when GPU memory is a constraint?
nf = NeuralForecast(models=models, freq='1min')
for i in range(0,df.shape[0], 60000):
dft = df[i:i+60000-1].copy()
nf.fit(df=dft)
Hey @rnjv,
If you are using panel data, your approach will not work.
If you are using a single series you could try to only use 60,000 windows that way, although you can achieve the same results with the inference_input_size
parameter.
On another note, using 60,000 lags in a auto-regression seems like a hard to learn model. I think it would be good to filter that data or test if so many lags are helpful or detrimental for your forecasting task.
Hey @rnjv,
To save and retrain a model you can use
core.save
andcore.load
methods. Here is a save and load tutorial.Calling a
core.fit
on the loaded model will retrain it incrementally.
How would you suggest training panel data / multivariate dataframe incrementally? As I understood - one loads the saved model; fit on the slice of data the model has not been trained on; and we would get the updated model.
Hi @rnjv . What model/s are you using?
The save
and load
methods will work regardless of the number of time series of the data.
We have updated the windows-based models (TFT, NHITS, etc) so that the length of the time series does not increase the memory requirement (other than fitting the data). For recurrent models (RNN, LSTM, etc), use input_size
and inference_input_size
to reduce the memory requirement (truncated backprogapation).
We mostly recommend training on batches (with save
and load
) only for updating a production model with the latest data.