NeMo `megatron_gpt_finetuning.py` does not work `max

`megatron_gpt_finetuning.py` does not work `max_epochs`

Open AtsunoriFujita opened this issue 10 months ago • 1 comments

examples/nlp/language_modeling/tuning/megatron_gpt_finetuning.py ignores trainer.max_epochs. Always refer to trainer.max_steps only.

Tested on: nvcr.io/nvidia/nemo:24.03.framework

Test case: trainer.max_steps=200 and trainer.max_epochs=1 (187 steps) 　- 200 step job finished trainer.max_steps=200 and trainer.max_epochs=5 (935 steps) 　- 200 step job finished

Am I missing something?

Apr 18 '24 15:04 AtsunoriFujita

It is not possible to set trainer.max_epochs when you set trainer.max_steps, due to by default trainer.max_epochs is ignored.

To set trainer.max_epochs you must do it without trainer.max_steps but by default trainer.max_steps=20000 is set.

You should find a trainer.max_steps setting that suits your needs.

May 02 '24 13:05 frankh077

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Jun 02 '24 01:06 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Jun 09 '24 01:06 github-actions[bot]

NeMo NeMo copied to clipboard

`megatron_gpt_finetuning.py` does not work `max_epochs`

NeMo
NeMo copied to clipboard