Dmytro Pykhtar

Results 59 comments of Dmytro Pykhtar

I'm not seeing this issue with our latest dev container so the fix should be alredy in main. It will be included in upcoming 24.07 release container.

> > Please check #9272 > > I checked and it using https://github.com/NVIDIA/NeMo/blob/main/requirements/requirements_lightning.txt#L7 it downgraded my transformers to 4.40.2 which fixed this problem as it pulled in https://github.com/huggingface/transformers/blob/v4.40.2/src/transformers/__init__.py#L1456 Hi @raybellwaves...

Hi, setting `--no-mmap-bin-files` arg. resolves the issue: https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/training/arguments.py#L2797

Hi, am I understood correctly that the configs for 4 GPUs and 8 GPUs are the same? Also, which callbacks do oyu use to log `train_step_timing in s` and `tps`....

@Proyag thanks for the scripts. Which HF model do you use?

It looks like that from logs you shared it shows you logs from GPU 0 and `train_step_timing in s` from GPU 0 respectively. I also managed to reproduce the script...