ipex-llm
ipex-llm copied to clipboard
Chronos : fit_loop epoch not reset properly for Autoformer
autoformer = AutoformerForecaster(past_seq_len=look_back,
future_seq_len=horizon,
input_feature_num=321,
output_feature_num=321,
label_len=label_len,
freq='h',
seed=1024,
loss="mse",
metrics=['mae', 'mse', 'mape'],
lr=0.0001,
d_model=128,
d_ff=64,
e_layer=2,
n_head=2)
autoformer.fit(train_loader, epochs=2, batch_size=32)
autoformer.trainer.fit(autoformer.internal, train_loader)
and the second fit is not from epoch 0.
You may initialize another Trainer
# ...
autoformer.fit(train_loader, epochs=2, batch_size=32)
trainer = Trainer(...)
trainer.fit(autoformer.internal, train_loader)
reproduce result
Not only Autoformer, if you carry out the same process on TCN, same issue will happen.(no doubt all forecasters inherited from basepytorchforecaster(e.g. S2S, TCN, LSTM, Nbeats will suffer this issue).
why this issue happens
In short, the original trainer has completed it's job (reach the max_epochs
), so it won't start anymore.
related information: https://github.com/Lightning-AI/lightning/issues/9636
do we need to solve this issue?
first of all, we don't want any user use forecaster.trainer
directly in their application code.
This only leads to a problem of increamental training, in this case, forecaster.fit
will be called multiple times. But if that's the case, a new trainer will be initialized every time fit
is called. We should be careful about the optimizor parameters. but it might be another issue.
related information: What's the best practice for continual learning? - #7 by Neeraj_Varshney - implementation help - PyTorch Lightning
left issue
Why the second fit will still train the last epoch? It should just be skipped. I have not got the answer
reproduce result Not only Autoformer, if you carry out the same process on TCN, same issue will happen when forecaster.num_processes = 1.(no doubt all forecasters inherited from basepytorchforecaster(e.g. S2S, TCN, LSTM, Nbeats will suffer this issue).