clearsky07
clearsky07
I don't know why so many tests failed.The added code can be run in my RCCL library,and achieved good performance.You can read the DOC document I wrote for further analysis.
> @clearsky07 - In the attached doc, you indicated that you observed only two NICs (out of four) are getting used. How many ranks did you run per node? What...
> The PR is consistently failing our CI tests - we're still investigating. Thanks a lot. I can only run on my specific machine , and got some performance improvement...
> > > @clearsky07 - In the attached doc, you indicated that you observed only two NICs (out of four) are getting used. How many ranks did you run per...
> The PR is consistently failing our CI tests - we're still investigating. Hi,is there any progress?
> @clearsky07 - We will have to lookup the test failures. In the meantime, can you explain at a high level what you are trying to achieve in this PR...
@nusislam I am sorry, there is two versions of RCCL on my machine:**rccl-develop** and **rccl with this PR**. Maybe I used the wrong version --**rccl with this PR** to describe...