InternVL
InternVL copied to clipboard
resume training techniques
Thanks for your excellent work and your effort on sharing the code.
Here I have a question when trying to train InternVL2:
In my experiment, I set --save_only_model to avoid saving the "global_step" checkpoint. But I found that the training loss did not converge after 1 epoch. When I restored the checkpoint and started continuous training, the loss increased (It may be because the parameters of adamw have not been restored). Are there some training tips for my experiment?