submitit icon indicating copy to clipboard operation
submitit copied to clipboard

Fallback to slurm for TorchDistributedEnv

Open jrapin opened this issue 2 years ago • 1 comments

jrapin avatar Aug 03 '22 08:08 jrapin

From my understanding, this change enables someone not using submitit to still be able to retrieve those environment variables that are normally set by torchrun.

can torchrun be used from python and not commandline?

This seems a bit weird to me, as this is a helper function from within submitit, so I would expect it to only be relevant when using it in conjunction with submitit. Maybe what we need to do instead is to see if we can setup those env vars in user code (maybe by using torchrun?).

i'm fine with it being in a user code, then again with only a couple of line changes we are able to accomodate more use cases easily, without duplicating code which can also bring some positive aspects :)

jrapin avatar Aug 23 '22 15:08 jrapin