AutomaTikZ
AutomaTikZ copied to clipboard
Can I continue training from a checkpoint?
It seems it can get the last checkpoint in train/llama.py. But the loss seems to start over again(at 1.6). It should be 0.2 at this checkpoint.
{'loss': 1.6927, 'learning_rate': 0.0003589922426773994, 'epoch': 24.06}
38%|█████████████████████████████████████████████████████████████████████▌ | 1549/4096 [25:45<11:16:38,
or can I code like this in train/llama.py?
check_point="/output/checkpoint-1536"
trainer.train(resume_from_checkpoint=check_point)
sorry to bother you for those questions. I am new to LLM fitune. I hope I can get your answer.