Mingxiao Huang comments

Results 12 comments of


                                            Mingxiao Huang

worse chainer convnet-benchmarks performance on cupy-2.0.0 as compared to cupy-1.0.0.1

seems that I went to wrong place, I meant to go to https://github.com/mitmul/convnet-benchmarks. sorry.

worse chainer convnet-benchmarks performance on cupy-2.0.0 as compared to cupy-1.0.0.1

Seems that the convenet benchmark performance turns up to normal after we upgrade cupy to '3.0.0a1'.

Port Chainer#4191 or use Chainer's BN implementation

@kuenishi we have the same question here. we recently tried multi-nodes test experiment, we found that googlenet_v2, googlenet_v3 and resnet50 show unexpected low(

Socket Timeout when using DDP

met same problem

Very Low Validation Accuracy for Resnet50 and Resnet101 models using TF-TRT

meet same issue

risk brought by no pre-allocate output for allgather&reducescatter&alltoall

@maxhgerlach Thanks for the detailed reply. our test case is like "mpirun -np 10 -ppn 10 pytest -v -k "allgather" test_torch.py", take test_horovod_allgather for example, the value of https://github.com/horovod/horovod/blob/master/test/parallel/test_torch.py#L1113 is...

risk brought by no pre-allocate output for allgather&reducescatter&alltoall

@maxhgerlach sorry for late feedback, I have been checking this issue. torch UT means this file https://github.com/horovod/horovod/blob/master/test/parallel/test_torch.py, we know that Horovod has code to deal with asynchronous memory allocations on...