InternVL resume training techniques

resume training techniques

Open ecnuycxie opened this issue 7 months ago • 0 comments

Thanks for your excellent work and your effort on sharing the code.

Here I have a question when trying to train InternVL2:

In my experiment, I set --save_only_model to avoid saving the "global_step" checkpoint. But I found that the training loss did not converge after 1 epoch. When I restored the checkpoint and started continuous training, the loss increased (It may be because the parameters of adamw have not been restored). Are there some training tips for my experiment?

Jul 14 '24 03:07 ecnuycxie

InternVL InternVL copied to clipboard

resume training techniques

InternVL
InternVL copied to clipboard