ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

Chronos : fit_loop epoch not reset properly for Autoformer

Open rnwang04 opened this issue 2 years ago • 2 comments

autoformer = AutoformerForecaster(past_seq_len=look_back,
                            future_seq_len=horizon,
                            input_feature_num=321,
                            output_feature_num=321,
                            label_len=label_len,
                            freq='h',
                            seed=1024,
                            loss="mse",
                            metrics=['mae', 'mse', 'mape'],
                            lr=0.0001,
                            d_model=128,
                            d_ff=64,
                            e_layer=2,
                            n_head=2)
autoformer.fit(train_loader, epochs=2, batch_size=32)
autoformer.trainer.fit(autoformer.internal, train_loader)

and the second fit is not from epoch 0.

rnwang04 avatar Jul 11 '22 02:07 rnwang04

You may initialize another Trainer

# ...
autoformer.fit(train_loader, epochs=2, batch_size=32)
trainer = Trainer(...)
trainer.fit(autoformer.internal, train_loader)

reproduce result

Not only Autoformer, if you carry out the same process on TCN, same issue will happen.(no doubt all forecasters inherited from basepytorchforecaster(e.g. S2S, TCN, LSTM, Nbeats will suffer this issue).

why this issue happens

In short, the original trainer has completed it's job (reach the max_epochs), so it won't start anymore.

related information: https://github.com/Lightning-AI/lightning/issues/9636

do we need to solve this issue?

first of all, we don't want any user use forecaster.trainer directly in their application code.

This only leads to a problem of increamental training, in this case, forecaster.fit will be called multiple times. But if that's the case, a new trainer will be initialized every time fit is called. We should be careful about the optimizor parameters. but it might be another issue.

related information: What's the best practice for continual learning? - #7 by Neeraj_Varshney - implementation help - PyTorch Lightning

left issue

Why the second fit will still train the last epoch? It should just be skipped. I have not got the answer

TheaperDeng avatar Jul 11 '22 04:07 TheaperDeng

reproduce result Not only Autoformer, if you carry out the same process on TCN, same issue will happen when forecaster.num_processes = 1.(no doubt all forecasters inherited from basepytorchforecaster(e.g. S2S, TCN, LSTM, Nbeats will suffer this issue).

rnwang04 avatar Jul 11 '22 05:07 rnwang04