litgpt
litgpt copied to clipboard
Wrong epoch number on last line
The epoch number is increased in the last line before the training finishes so that it is no longer correct. It's a problem in all finetuning scripts:
Epoch 4 | iter 961 step 961 | loss train: 1.062, val: 1.057 | iter time: 529.46 ms (step)
Epoch 4 | iter 962 step 962 | loss train: 0.937, val: 1.057 | iter time: 503.53 ms (step)
Epoch 4 | iter 963 step 963 | loss train: 0.971, val: 1.057 | iter time: 522.10 ms (step)
Epoch 4 | iter 964 step 964 | loss train: 0.902, val: 1.057 | iter time: 115.27 ms (step)
Epoch 5 | iter 965 step 965 | loss train: 1.182, val: 1.057 | iter time: 743.31 ms (step)
Training time: 583.36s
Memory used: 14.49 GB
Saving LoRA weights to 'out/finetune/lora-tiny-llama-1.1b/final/lit_model.pth.lora'
Also for how many iterations does each epoch run?
Good question. The number of iterations depends on the batch size. I.e., one epoch means one full pass over the dataset. If you have a smaller batch size this will take more iterations per epoch.