AutomaTikZ Can I continue training from a checkpoint?

Can I continue training from a checkpoint?

Open JasonLLLLLLLLLLL opened this issue 1 year ago • 1 comments

It seems it can get the last checkpoint in train/llama.py. But the loss seems to start over again(at 1.6). It should be 0.2 at this checkpoint.

{'loss': 1.6927, 'learning_rate': 0.0003589922426773994, 'epoch': 24.06}                                                                                                                                                  
 38%|█████████████████████████████████████████████████████████████████████▌                                                                                                                  | 1549/4096 [25:45<11:16:38,

or can I code like this in train/llama.py?

check_point="/output/checkpoint-1536"
trainer.train(resume_from_checkpoint=check_point)

sorry to bother you for those questions. I am new to LLM fitune. I hope I can get your answer.

Nov 25 '23 16:11 JasonLLLLLLLLLLL

AutomaTikZ AutomaTikZ copied to clipboard

Can I continue training from a checkpoint?

AutomaTikZ
AutomaTikZ copied to clipboard