hydra-torch icon indicating copy to clipboard operation
hydra-torch copied to clipboard

Single-node distributed processing with Hydra

Open briankosw opened this issue 4 years ago • 4 comments

Distributed processing with Hydra in single-node multi-GPU setting, as mentioned here.

  • [ ] Explain PyTorch's distributed processing/training.
  • [ ] Simple demonstration of various distributed communication primitives.
  • [ ] Incorporate Hydra into PyTorch's distributed processing.
  • [ ] Using multirun to run multiple processes.

This will serve as an introductory example for #38.

briankosw avatar Dec 05 '20 03:12 briankosw

@romesco would love your feedback on this!

briankosw avatar Dec 05 '20 03:12 briankosw

Sounds great! What do you think about using the MNIST example as a base? Or did you have something even simpler in mind?

I want to make sure we don't over complicate things on this one. As an example, I would say we can start without using the configs directly since they're somewhat orthogonal from demonstrating how hydra and DDP interact. If you make a draft PR, I'll run everything and provide feedback of course =].

romesco avatar Dec 05 '20 07:12 romesco

I think the idea here is to not actually train but just demonstrate basic primitives.

omry avatar Dec 05 '20 08:12 omry

Sounds great! What do you think about using the MNIST example as a base? Or did you have something even simpler in mind?

If you check this PR out, you'll see a basic distributed processing setup using Hydra and distributed communication primitives between multiple processes. This is basically as simple as it gets and much simpler than MNIST.

I want to make sure we don't over complicate things on this one. As an example, I would say we can start without using the configs directly since they're somewhat orthogonal from demonstrating how hydra and DDP interact. If you make a draft PR, I'll run everything and provide feedback of course =].

So this PR/example will be about how Hydra helps set up distributed processes without using configs? Should the configs aspect be implemented in the other PR?

I think the idea here is to not actually train but just demonstrate basic primitives.

In that case, I will only demonstrate how Hydra can be used to set up distributed processing.

briankosw avatar Dec 05 '20 09:12 briankosw