ipex-llm Chronos : fit_loop epoch not reset properly for Autoformer

autoformer = AutoformerForecaster(past_seq_len=look_back,
                            future_seq_len=horizon,
                            input_feature_num=321,
                            output_feature_num=321,
                            label_len=label_len,
                            freq='h',
                            seed=1024,
                            loss="mse",
                            metrics=['mae', 'mse', 'mape'],
                            lr=0.0001,
                            d_model=128,
                            d_ff=64,
                            e_layer=2,
                            n_head=2)
autoformer.fit(train_loader, epochs=2, batch_size=32)
autoformer.trainer.fit(autoformer.internal, train_loader)

and the second fit is not from epoch 0.

Jul 11 '22 02:07 rnwang04

You may initialize another Trainer

# ...
autoformer.fit(train_loader, epochs=2, batch_size=32)
trainer = Trainer(...)
trainer.fit(autoformer.internal, train_loader)

reproduce result

Not only Autoformer, if you carry out the same process on TCN, same issue will happen.(no doubt all forecasters inherited from basepytorchforecaster(e.g. S2S, TCN, LSTM, Nbeats will suffer this issue).

why this issue happens

In short, the original trainer has completed it's job (reach the max_epochs), so it won't start anymore.

related information: https://github.com/Lightning-AI/lightning/issues/9636

do we need to solve this issue?

first of all, we don't want any user use forecaster.trainer directly in their application code.

This only leads to a problem of increamental training, in this case, forecaster.fit will be called multiple times. But if that's the case, a new trainer will be initialized every time fit is called. We should be careful about the optimizor parameters. but it might be another issue.

left issue

Why the second fit will still train the last epoch? It should just be skipped. I have not got the answer

Jul 11 '22 04:07 TheaperDeng

reproduce result Not only Autoformer, if you carry out the same process on TCN, same issue will happen when forecaster.num_processes = 1.(no doubt all forecasters inherited from basepytorchforecaster(e.g. S2S, TCN, LSTM, Nbeats will suffer this issue).

Jul 11 '22 05:07 rnwang04

ipex-llm ipex-llm copied to clipboard

Chronos : fit_loop epoch not reset properly for Autoformer

reproduce result

why this issue happens

do we need to solve this issue?

left issue

ipex-llm
ipex-llm copied to clipboard