Peter Andreas Entschev
Peter Andreas Entschev
> Just to be clear it is doubling the memory usage per object being transmitted during its transmission. So it is not as simple as doubling all memory or for...
> I think we are on the same page. I'd like people to try it and report feedback before we consider merging. Awesome, please keep us posted. Let me know...
> Thus far have tried the MRE from issue ( rapidsai/ucx-py#402 ) where it seems to help. Could you elaborate on what you mean by "seems to help"? Another question:...
My tests show an improvement of this PR versus the current master branch, so definitely +1 from that perspective. I'm not able to evaluate memory footprint right now, but I'm...
> I hadn't confirmed this yet. Though NVLink was enabled when I ran in all cases before. Of course that isn't confirmation that it works 😉 I forgot to mention...
And of course, thanks for the nice work @jakirkham !
I'm now seeing the following errors just as workers connect to the scheduler. Errors on scheduler: ```python-traceback ucp.exceptions.UCXMsgTruncated: Comm Error "[Recv #002] ep: 0x7fac27641380, tag: 0xf2597f095b80a8c, nbytes: 1179, type: ":...
@jakirkham performance-wise, I'd say this is a good improvement. I did some runs with 4 DGX-1 nodes using the code from https://github.com/rapidsai/ucx-py/issues/402#issue-556594147, please see details below: ``` IB Create time:...
> Thanks Peter! Can you please share a bit about where this was run? This was run on a small cluster of 4 DGX-1 nodes, I updated my post above...
From discussions in https://github.com/rapidsai/ucxx/pull/61 , it seems that `cuda-nvcc` is required even if linking only from the host compiler, @jakirkham wrote: > Essentially the CUDA compiler package is needed to...