InternVL Strange loss curve when `num_train

Thank you for releasing this wonderful work and keep updating the latest scripts for training and fine-tuning!

Recently I have tried to fine-tune the InternVL-V1.5 using custom dataset, and I found that when I set num_train_epoch>1, the training loss curve looked unreasonable, showing a sudden drop as we entered the second epoch. So as entering the third epoch. b511799de9f25e50ea9952d70a4ccd1 Is this normal? Or this script does not support epoch>1?

If the issue has been mentioned in the technical report or repo README but I missed it, please point it out without re-explaination. Looking forward to your reply!

May 17 '24 07:05 ChorlingLau

As far as I know, this phenomenon is very common when training large models. This reflects the overfitting of the model to the training set to some extent.

May 19 '24 11:05 czczup

Thanks for your reply! I will try to lower the learning rate to fix it.

May 20 '24 03:05 ChorlingLau

As far as I know, this phenomenon is very common when training large models. This reflects the overfitting of the model to the training set to some extent.

I faced the same loss curve in my down-stream task fine-tune, which means I may enconter the same overfiiting issue. Could you please share some practical methods to solve the overfitting problem. It will help me a lot. Thank you in advance.

Jun 18 '24 09:06 Kenjamin99

Strange loss curve when `num_train_epoch`>1