hydra-torch
hydra-torch copied to clipboard
Basic distributed processing with Hydra
Implemented a basic script that demonstrates distributed processing with Hydra, as mentioned in #42. The command to run the script is:
python ddp_00.py -m rank=... init_method=...
where rank
is a list of the ranks (either a comma-separated list of integers or range(start, stop)
) and init_method
is a string that specifies one of the two possible initialization methods: TCP initialization and shared file-system initialization (environment variable initialization is not related to init_method
).
I'll add documentation (Markdown) that explains the distributed processing in PyTorch as well as how Hydra can kick off distributed processes as demonstrated in the script.
rank can also be a range(start,stop)
.
rank can also be a
range(start,stop)
.
Not sure if the user even needs to provide the rank
argument explicitly. rank
has to vary from 0 to num_gpus
- 1 (for standard usecases). So we might just infer it ourselves. I understand why does it have to be specified manually right now but this could be a useful example for callbacks.
rank can also be a
range(start,stop)
.Not sure if the user even needs to provide the
rank
argument explicitly.rank
has to vary from 0 tonum_gpus
- 1 (for standard usecases). So we might just infer it ourselves. I understand why does it have to be specified manually right now but this could be a useful example for callbacks.
I agree that it's not the best user experience but we don't have calllback right now and even if we did, we need to make sure the design would actually support this. (It's not obvious that this is the case).
rank can also be a
range(start,stop)
.Not sure if the user even needs to provide the
rank
argument explicitly.rank
has to vary from 0 tonum_gpus
- 1 (for standard usecases). So we might just infer it ourselves. I understand why does it have to be specified manually right now but this could be a useful example for callbacks.I agree that it's not the best user experience but we don't have calllback right now and even if we did, we need to make sure the design would actually support this. (It's not obvious that this is the case).
@shagunsodhani What would it take to implement a callback?
rank can also be a
range(start,stop)
.Not sure if the user even needs to provide the
rank
argument explicitly.rank
has to vary from 0 tonum_gpus
- 1 (for standard usecases). So we might just infer it ourselves. I understand why does it have to be specified manually right now but this could be a useful example for callbacks.I agree that it's not the best user experience but we don't have calllback right now and even if we did, we need to make sure the design would actually support this. (It's not obvious that this is the case).
@shagunsodhani What would it take to implement a callback?
Oops sorry missed this comment :) @omry will have a better insight about that. ccing @jieru-hu who is working on callbacks.
Callbacks will likely be pushed back to Hydra 1.2.
@briankosw I'd love to help you push this forward. How are you doing? Bogged down in other work - because I know that feeling haha! Let me know how I can help.
Callbacks will likely be pushed back to Hydra 1.2.
We will have callbacks in 1.1, but I am no longer sure we should use them here. This is not documented yet, but will be before 1.1 is released (Example app).
I am no longer sure we should leverage callbacks here, but it should now be possible to play with it on master.