Add dist hooks support for custom device
Fixes #104389
Try to add dist hooks support for custom device again due to original PR https://github.com/pytorch/pytorch/pull/104155 is marked with stale label.
original desc
- Now for distributed hooks, there are some hard-code like cuda, we want to support these hooks for custom device( privateuse1 backend), so we use the abstract device-module to run some funcs.
- In torch/nn/parallel/distributed.py, I want to define a variable self.device_module = getattr(torch, self.device_type, None) so that we can reuse it, but it will cause an error in serialization, TypeError: cannot pickle 'module' object. So we call the func by getattr when needs.
Function _get_device_module is landed in this PR https://github.com/pytorch/pytorch/pull/107289
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @kiukchung @d4l3k @lucasllc
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/114730
- :page_facing_up: Preview Python docs built from this PR
- :page_facing_up: Preview C++ docs built from this PR
- :question: Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours
Note: Links to docs will display an error until the docs builds have been completed.
:white_check_mark: No Failures
As of commit d8cfeaf66ec2b3b1aa05e30012ebda024383ffeb with merge base 2de7468d2ba77c92219652dc5cecaa0300c02f8c ():
:green_heart: Looks good so far! There are no failures yet. :green_heart:
This comment was automatically generated by Dr. CI and updates every 15 minutes.
@albanD @fduwjj @wz337 Sorry to bother you. I would appreciate it if you could review this PR.
cc @wconstab who's the right person to review DDP PRs?
cc @fduwjj @rohan-varma @wz337 Could you take a look at this PR?
@pytorchbot label "ciflow/trunk"
Please seek CI approval before scheduling CIFlow labels
@pytorchbot label "ciflow/trunk"
Please seek CI approval before scheduling CIFlow labels
@albanD Sorry to bother. How to acquire a CI approval for this PR? (due to some cases cannot be verified in local)
@wconstab can do that when he reviews the PR!
@pytorchbot label "ciflow/trunk"
Sorry to bother. @wconstab It seems that adding label directly is not allowed. Maybe there is an Approve and Run button for CIFlow?
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.Stale pull requests will automatically be closed after 30 days of inactivity.