neuralforecast icon indicating copy to clipboard operation
neuralforecast copied to clipboard

How to incrementally retrain model with new data

Open rnjv opened this issue 1 year ago • 7 comments

Given streaming data, how do we retrain on new smaller batch of data on previously trained model.

Is this possible?

rnjv avatar Apr 20 '23 13:04 rnjv

Hey @rnjv,

To save and retrain a model you can use core.save and core.load methods. Here is a save and load tutorial.

Calling a core.fit on the loaded model will retrain it incrementally.

kdgutier avatar Apr 20 '23 13:04 kdgutier

Will core.fit and retraining by batches reduce the memory requirements, currently memory requirements looks heavy.

rnjv avatar May 06 '23 14:05 rnjv

Hey @rnjv,

The memory requirements depend on the models.

If you are using Recurrent-based models like LSTM, TCN, GRU you can use the inference_input_size parameter to reduce the memory usage during inference. What the parameter does is trimming the length of the time series at the time of the prediction.

kdgutier avatar May 09 '23 12:05 kdgutier

Is doing something like this reliable when GPU memory is a constraint?

nf = NeuralForecast(models=models, freq='1min')
for i in range(0,df.shape[0], 60000):
  dft = df[i:i+60000-1].copy()
  nf.fit(df=dft)

rnjv avatar May 17 '23 01:05 rnjv

Hey @rnjv,

If you are using panel data, your approach will not work.

If you are using a single series you could try to only use 60,000 windows that way, although you can achieve the same results with the inference_input_size parameter.

On another note, using 60,000 lags in a auto-regression seems like a hard to learn model. I think it would be good to filter that data or test if so many lags are helpful or detrimental for your forecasting task.

kdgutier avatar May 17 '23 12:05 kdgutier

Hey @rnjv,

To save and retrain a model you can use core.save and core.load methods. Here is a save and load tutorial.

Calling a core.fit on the loaded model will retrain it incrementally.

How would you suggest training panel data / multivariate dataframe incrementally? As I understood - one loads the saved model; fit on the slice of data the model has not been trained on; and we would get the updated model.

rnjv avatar May 17 '23 12:05 rnjv

Hi @rnjv . What model/s are you using?

The save and load methods will work regardless of the number of time series of the data.

We have updated the windows-based models (TFT, NHITS, etc) so that the length of the time series does not increase the memory requirement (other than fitting the data). For recurrent models (RNN, LSTM, etc), use input_size and inference_input_size to reduce the memory requirement (truncated backprogapation).

We mostly recommend training on batches (with save and load) only for updating a production model with the latest data.

cchallu avatar Jun 06 '23 22:06 cchallu