Mingxiao Huang

Results 12 comments of Mingxiao Huang

seems that I went to wrong place, I meant to go to https://github.com/mitmul/convnet-benchmarks. sorry.

Seems that the convenet benchmark performance turns up to normal after we upgrade cupy to '3.0.0a1'.

@kuenishi we have the same question here. we recently tried multi-nodes test experiment, we found that googlenet_v2, googlenet_v3 and resnet50 show unexpected low(

@maxhgerlach Thanks for the detailed reply. our test case is like "mpirun -np 10 -ppn 10 pytest -v -k "allgather" test_torch.py", take test_horovod_allgather for example, the value of https://github.com/horovod/horovod/blob/master/test/parallel/test_torch.py#L1113 is...

@maxhgerlach sorry for late feedback, I have been checking this issue. torch UT means this file https://github.com/horovod/horovod/blob/master/test/parallel/test_torch.py, we know that Horovod has code to deal with asynchronous memory allocations on...

@maxhgerlach our code is based on v0.28, so, fix in (https://github.com/horovod/horovod/pull/3639) has already been in our code base.

@romerojosh thanks, we found some hints in oneCCL but need further debug, will feedback once we have progress.

> (Also meant to be a comment, not approval until changes are discussed) @muellerzr @sgugger I just push a new commit to avoid the bug exposed in accelerate test due...