Time-LLM
Time-LLM copied to clipboard
The problem of loss reduction in training process
Very good work. Due to the limited equipment, I ran LLAMA in a single Nvidia 3090, and the batch size was set to 4. I only ran 4000 of data from ETTh1, and the input sequence was set to 256 and the output to 48. The learning_rate is 0.001, and other params are followed the script.
During training, I found that the first epoch was the best, and the other losses up end to stop after 10 early stops. I run BERT and GPT2 in a similar situation. (Set 100 epochs, but usually the best is about 10 epochs). The final MAE is 0.6039629. After visualization, it is found that the prediction effect of slightly regular sequences is good, while the prediction effect of less regular sequences is poor.
I don't know if the training process is correct, or if it converges with very few epochs due to the power of LLM in TIME-LLM? Do you have any suggestions? Thanks.