Sourab Mangrulkar

Results 244 comments of Sourab Mangrulkar

Hello Stas, Thank you for the information. I observe it with trainer too. Steps to reproduce the behaviour with trainer: 1. Official `run_glue.py` [script](https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-classification/run_glue.py) with the following change. The change...

> Also you're running pt-nightly - I wonder if this is something new in pytorch? Does it work with pt-1.11 Yes, this is on pt-nightly. However, I believe it has...

Hello @tjruwase, Getting below error with v0.6.0: ``` Traceback (most recent call last): File "/home/sourab/deepspeed-test/src/text-classification/run_glue_no_trainer.py", line 619, in main() File "/home/sourab/deepspeed-test/src/text-classification/run_glue_no_trainer.py", line 511, in main accelerator.backward(loss) File "/home/sourab/accelerate/src/accelerate/accelerator.py", line 616,...

Hello @tjruwase , I tried rerunning using the latest release with multiple and single GPU(s) setup. I don't observe accuracy issue anymore (above might have used different DeBERTa pretrained checkpoint...

Hello @tjruwase, Thank you for the fix 😄! Yes, the above PR is working as expected to suppress the warnings.

Hello @shrinath-suresh , this issue has to be fixed from PyTorch side. The issue raised with PyTorch has been linked above.

Also, when using `auto_wrap` please specify either `--fsdp_transformer_layer_cls_to_wrap ` or `--fsdp_min_num_params ` as part of cmd arguments. This is what enables sharding of parameters, gradients and optimizer state across GPUs...

> There is a bit too much in this PR to wrap my head around. Can we split it between multiGPU launcher fixes, DeepSpeed launcher fixes and other fixes? Thanks!...

Hello @Aaryan369, when using the `standard` launcher, I hope you are launching the script on both nodes as you would typically do when using `torchrun` in the multinode setting. I...