Shivam Shandilya
Shivam Shandilya
Hey @karan6181 , thanks for the info. TO avoid the above in my code, I use the LightningDataModule, which sets these variables in its setup call. I create the datamodule...
Hey, you are right. There are not set before `trainer.fit()`. Only inside the values are `WORLD_SIZE=4, LOCAL_WORLD_SIZE=1, RANK=0`, that is, before dataset instantiation. I guess the `LOCAL_WORLD_SIZE` is not set...
@karan6181 Why does setting them manually before dataset initialization not work? I tried to set the `LOCAL_WORLD_SIZE` variable before the dataset initialization. The previous error doesn't occur now but the...
> I believe some ranks are waiting for other ranks for synchronization and if the env variables are not set correctly, you will see a hang. And do you know...
Hey @jiamings , so I tried setting these variables for each rank in the `setup` function itself, but that too didn't seem to help the issue. The training started for...
Hey @jiamings , yes, torchrun seems to be working well for me too for now in a streaming-PTL setup. I am trying to check if this holds true in a...