NeMo
NeMo copied to clipboard
`megatron_gpt_finetuning.py` does not work `max_epochs`
examples/nlp/language_modeling/tuning/megatron_gpt_finetuning.py
ignores trainer.max_epochs
.
Always refer to trainer.max_steps
only.
Tested on: nvcr.io/nvidia/nemo:24.03.framework
Test case:
trainer.max_steps=200
and trainer.max_epochs=1
(187 steps)
- 200 step job finished
trainer.max_steps=200
and trainer.max_epochs=5
(935 steps)
- 200 step job finished
Am I missing something?
It is not possible to set trainer.max_epochs
when you set trainer.max_steps
, due to by default trainer.max_epochs
is ignored.
To set trainer.max_epochs
you must do it without trainer.max_steps
but by default trainer.max_steps=20000
is set.
You should find a trainer.max_steps
setting that suits your needs.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.