Shivam Shandilya

Results 6 comments of Shivam Shandilya

Hey @karan6181 , thanks for the info. TO avoid the above in my code, I use the LightningDataModule, which sets these variables in its setup call. I create the datamodule...

Hey, you are right. There are not set before `trainer.fit()`. Only inside the values are `WORLD_SIZE=4, LOCAL_WORLD_SIZE=1, RANK=0`, that is, before dataset instantiation. I guess the `LOCAL_WORLD_SIZE` is not set...

@karan6181 Why does setting them manually before dataset initialization not work? I tried to set the `LOCAL_WORLD_SIZE` variable before the dataset initialization. The previous error doesn't occur now but the...

> I believe some ranks are waiting for other ranks for synchronization and if the env variables are not set correctly, you will see a hang. And do you know...

Hey @jiamings , so I tried setting these variables for each rank in the `setup` function itself, but that too didn't seem to help the issue. The training started for...

Hey @jiamings , yes, torchrun seems to be working well for me too for now in a streaming-PTL setup. I am trying to check if this holds true in a...