pytorch icon indicating copy to clipboard operation
pytorch copied to clipboard

Add dist hooks support for custom device

Open dilililiwhy opened this issue 2 years ago • 4 comments

Fixes #104389

Try to add dist hooks support for custom device again due to original PR https://github.com/pytorch/pytorch/pull/104155 is marked with stale label.

original desc

  1. Now for distributed hooks, there are some hard-code like cuda, we want to support these hooks for custom device( privateuse1 backend), so we use the abstract device-module to run some funcs.
  2. In torch/nn/parallel/distributed.py, I want to define a variable self.device_module = getattr(torch, self.device_type, None) so that we can reuse it, but it will cause an error in serialization, TypeError: cannot pickle 'module' object. So we call the func by getattr when needs.

Function _get_device_module is landed in this PR https://github.com/pytorch/pytorch/pull/107289

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @kiukchung @d4l3k @lucasllc

dilililiwhy avatar Nov 29 '23 03:11 dilililiwhy

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/114730

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: No Failures

As of commit d8cfeaf66ec2b3b1aa05e30012ebda024383ffeb with merge base 2de7468d2ba77c92219652dc5cecaa0300c02f8c (image): :green_heart: Looks good so far! There are no failures yet. :green_heart:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot[bot] avatar Nov 29 '23 03:11 pytorch-bot[bot]

@albanD @fduwjj @wz337 Sorry to bother you. I would appreciate it if you could review this PR.

dilililiwhy avatar Nov 30 '23 10:11 dilililiwhy

cc @wconstab who's the right person to review DDP PRs?

albanD avatar Dec 05 '23 21:12 albanD

cc @fduwjj @rohan-varma @wz337 Could you take a look at this PR?

dilililiwhy avatar Feb 17 '24 08:02 dilililiwhy

@pytorchbot label "ciflow/trunk"

dilililiwhy avatar Feb 21 '24 04:02 dilililiwhy

Please seek CI approval before scheduling CIFlow labels

pytorch-bot[bot] avatar Feb 21 '24 04:02 pytorch-bot[bot]

@pytorchbot label "ciflow/trunk"

wconstab avatar Feb 22 '24 14:02 wconstab

Please seek CI approval before scheduling CIFlow labels

pytorch-bot[bot] avatar Feb 22 '24 14:02 pytorch-bot[bot]

@albanD Sorry to bother. How to acquire a CI approval for this PR? (due to some cases cannot be verified in local)

dilililiwhy avatar Feb 23 '24 03:02 dilililiwhy

@wconstab can do that when he reviews the PR!

albanD avatar Feb 23 '24 15:02 albanD

@pytorchbot label "ciflow/trunk"

Sorry to bother. @wconstab It seems that adding label directly is not allowed. Maybe there is an Approve and Run button for CIFlow?

dilililiwhy avatar Feb 26 '24 02:02 dilililiwhy

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

github-actions[bot] avatar Apr 27 '24 03:04 github-actions[bot]