NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

`megatron_gpt_finetuning.py` does not work `max_epochs`

Open AtsunoriFujita opened this issue 10 months ago • 1 comments

examples/nlp/language_modeling/tuning/megatron_gpt_finetuning.py ignores trainer.max_epochs. Always refer to trainer.max_steps only.

Tested on: nvcr.io/nvidia/nemo:24.03.framework

Test case: trainer.max_steps=200 and trainer.max_epochs=1 (187 steps)  - 200 step job finished trainer.max_steps=200 and trainer.max_epochs=5 (935 steps)  - 200 step job finished

Am I missing something?

AtsunoriFujita avatar Apr 18 '24 15:04 AtsunoriFujita

It is not possible to set trainer.max_epochs when you set trainer.max_steps, due to by default trainer.max_epochs is ignored.

To set trainer.max_epochs you must do it without trainer.max_steps but by default trainer.max_steps=20000 is set.

You should find a trainer.max_steps setting that suits your needs.

frankh077 avatar May 02 '24 13:05 frankh077

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jun 02 '24 01:06 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Jun 09 '24 01:06 github-actions[bot]