Harman Singh

Results 4 comments of Harman Singh

Hi, I am facing an issue where, on increasing the number of gpus and nodes, the number of steps donot change. for eg if I run ` python run.py with...

> @Jxu-Thu I ran your settings. > > `python run.py with data_root=/mnt/nfs/dandelin num_gpus=1 num_nodes=1 task_mlm_itm whole_word_masking=True step100k per_gpu_batchsize=32` => `Epoch 0: 0%| | 130/290436 [03:56 > `python run.py with data_root=/mnt/nfs/dandelin...

Not particular suggestion for this repo, but it's likely that one of your gpus is waiting for other gpu's for eg for syncing of data. Are you using NCCL backend?...

how large a subset did you take? I can try helping with expts if required. also how large was the clip model? if this doesnt work then one thing can...