pytorch-distributed icon indicating copy to clipboard operation
pytorch-distributed copied to clipboard

A quickstart and benchmark for pytorch distributed training.

Results 14 pytorch-distributed issues
Sort by recently updated
recently updated
newest added

是不是Windows不能用NCCL的backend呢?如果是这样,请问Windows 想用多GPU怎么解决呢?感谢!

Bumps [horovod](https://github.com/horovod/horovod) from 0.18.2 to 0.24.0. Release notes Sourced from horovod's releases. Elastic mode improvements, MXNet async dependency engine, fixes for latest PyTorch and TensorFlow versions Added Ray: Added elastic...

dependencies

按照你的脚本跑,一直报错,找不到原因。 ``` root@pai-worker1:/home/Data/exports/pytorch-distributed# srun -N1 -n2 --gres gpu:2 python distributed_slurm_main.py --dist-file dist_file Traceback (most recent call last): File "distributed_slurm_main.py", line 420, in main() File "distributed_slurm_main.py", line 131, in main mp.spawn(main_worker,...

i want to modify it that let it can work on multi mechine , I don't kown how to do it?

我run distributed.py ,发现显存占用不均衡,主卡占用10GB,另外3个卡占用8GB。 请问怎么解决?

作者大大您好,为何代码中计算梯度的时候用的是loss.backward()而不是reduce_loss.backward() ?

例如,我在一张8卡节点上训练,想用其中4张训练 如果我用0,1,2,3是可以训练的 但是如果我用 其他任意组合的gpuid就不可以 我参考了这个把每个进程的gpuid 改了 https://github.com/PyTorchLightning/pytorch-lightning/issues/2407 会提示 `RuntimeError: cuda runtime error (10) : invalid device ordinal at /pytorch/torch/csrc/cuda/Module.cpp:59` 我的代码 ``` import torch import torch.nn as nn import torch.distributed as...

我仿照了您的方法实现了一次分布式训练:发现单机单卡和多机多卡完成相同次数epoch的时间差不多,遂有所问。