ALDM icon indicating copy to clipboard operation
ALDM copied to clipboard

How to determine the final training checkpoint?

Open jianlufu121 opened this issue 1 year ago • 1 comments

Hello, thank you for sharing this excellent project! While using your code for training, I encountered some issues regarding the selection of the final training checkpoint. Currently, I am unable to monitor the changes in the loss during training, and it seems that the log files do not save any loss-related data either. As a result, I am unsure how to choose an appropriate checkpoint for testing . Could you please advise if there is a recommended method or criteria for determining the final training checkpoint? I would greatly appreciate your guidance. Thank you!

jianlufu121 avatar Dec 05 '24 07:12 jianlufu121

Hi @jianlufu121 , thanks for your interest! Depending on the training dataset size, one can use a fixed training iteration. Some intermediate generation results are logged, you may check them to better determine when to stop.

YumengLi007 avatar Dec 08 '24 09:12 YumengLi007