Strange loss curve when `num_train_epoch`>1
Thank you for releasing this wonderful work and keep updating the latest scripts for training and fine-tuning!
Recently I have tried to fine-tune the InternVL-V1.5 using custom dataset, and I found that when I set num_train_epoch>1, the training loss curve looked unreasonable, showing a sudden drop as we entered the second epoch. So as entering the third epoch.
Is this normal? Or this script does not support epoch>1?
If the issue has been mentioned in the technical report or repo README but I missed it, please point it out without re-explaination. Looking forward to your reply!
As far as I know, this phenomenon is very common when training large models. This reflects the overfitting of the model to the training set to some extent.
Thanks for your reply! I will try to lower the learning rate to fix it.
As far as I know, this phenomenon is very common when training large models. This reflects the overfitting of the model to the training set to some extent.
I faced the same loss curve in my down-stream task fine-tune, which means I may enconter the same overfiiting issue. Could you please share some practical methods to solve the overfitting problem. It will help me a lot. Thank you in advance.