Expose all torch.distributed.init_process_group parameters in the DistributedManager
Modulus Pull Request
Description
Adding kwargs to DistributedManager.initialize to pass down to torch.distributed.init_process_group. Added a test to specifically check that the timeout parameter gets passed down to torch.
Checklist
- [x] I am familiar with the Contributing Guidelines.
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.
- [x] The CHANGELOG.md is up to date with these changes.
- [ ] An issue is linked to this pull request.
Dependencies
None
How does it behave if you pass a kwarg which has already been passed explicitly, for example rank or world_size? Will that overwrite the previous one?
That's a good point. Maybe should pop the explicitly specified kwargs out before passing them down?
@akshaysubr do we want to merge this PR before the release?
@mnabian Yes, we should merge this before the release. This is a fairly low risk PR I think but exposes certain mechanisms for more advanced usage. I think we can merge this as is and add other functionality that come up in subsequent PRs.