stylable
stylable copied to clipboard
bps.push_pull give wrong result for pytorch
When I train mnist with pytorch, I found the output accuracy and loss are werid. Then I tried to print it out before push_pull.
So I guess this is because push_pull
gives wrong results for CPU tensors.
https://github.com/bytedance/byteps/blob/cf020c97fc718ca209cbadbfac4cffa5e49d7d21/example/pytorch/train_mnist_byteps.py#L133
Actually it is a known bug which was also found in MXNet I have mentioned before. #247 Current workaround is to set the tensor to cuda.
tensor = torch.tensor(val).cuda()
We also found that push pull on CPU tensor may lead to instability. We will check and fix it.