distributed_tutorial icon indicating copy to clipboard operation
distributed_tutorial copied to clipboard

Results 11 distributed_tutorial issues
Sort by recently updated
recently updated
newest added

Thanks for the great tutorial. One thing I still don't understand: how are the master address and port determined? Is this set by my machine, i.e. if I have a...

where does dist.destroy_process_group() go in your DDP MNIST example: https://github.com/yangkky/distributed_tutorial/blob/master/src/mnist-mixed.py ?

I saw the tutorial (https://pytorch.org/tutorials/intermediate/ddp_tutorial.html#save-and-load-checkpoints): ``` def demo_checkpoint(rank, world_size): print(f"Running DDP checkpoint example on rank {rank}.") setup(rank, world_size) model = ToyModel().to(rank) ddp_model = DDP(model, device_ids=[rank]) loss_fn = nn.MSELoss() optimizer =...

Hi, Thanks for the easy following tutorial on distributed processing. I followed your example, it works fine on a single multi-gpu system. On running it on multiple nodes with 2...

you write in you blog https://yangkky.github.io/2019/07/08/distributed-pytorch-tutorial.html - . It’s also possible to have multiple worker processes that fetch data for each GPU. How can I enable this? I am running...

I noticed that they use the "Save and Load Checkpoints" to synchronize all models in different process in the PyTorch tutorial https://pytorch.org/tutorials/intermediate/ddp_tutorial.html ![image](https://user-images.githubusercontent.com/53101398/69497127-1a2d8080-0e9f-11ea-985d-3c6e095af6ed.png) So, I want to know if there...

How to add DDP with val evaluation? Is it same with train? @yangkky

https://github.com/yangkky/distributed_tutorial/blob/24467967c1c719110c33fccca69353ad8e5ae2e4/src/mnist-mixed.py#L108-L114 could you add the save model line to the example to be more complete. Thanks! ```python torch.save(model.state_dict(), CHECKPOINT_PATH) ```

Hi, thanks for the excellent example of using DistributedDataParallel in PyTorch; it is very easy to understand and is much better that Pytorch docs. One important bit that is missing...

Hi, I tried running my code like your example, and I got this error ``` File "artGAN512_impre_v8.py", line 286, in main mp.spawn(train, nprocs=args.gpus, args=(args,)) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 167, in spawn...