Saliya Ekanayake
Saliya Ekanayake
@Sergei-Lebedev , on a side note, dist.new_group() with UCC might also benefit from this PR I've Pass group ranks and options to third party distributed backends by esaliya · Pull...
@gummif so is `get()` only recommended for trivial types? I further simplified this bug that happens without variants or strings. For example, just sending the following struct would cause misaligned...
So is there a way we can provide a pre-allocated buffer to `zmq::message_t` that'll be used during receive? There's an API for that but I found `recv` operation does not...
After digging into zmq code, I see that even if you provide a buffer, zmq will not avoid copying as it receives to an internal buffer and then copies it...
> @esaliya It looks like there some CI failures. Seems there are some test backends that fail due to the new API. I'll take a look to fix this.
@kumpera could you help to trigger the CI for this PR?
@JackCaoG, got a suggestion you might have some tip here. I am making a change to 3rd party backend registration in `torch.distributed.register_backend()`. This cause test failures coming from xla repo....
> > @JackCaoG, got a suggestion you might have some tip here. I am making a change to 3rd party backend registration in `torch.distributed.register_backend()`. This cause test failures coming from...
@JackCaoG, Technically, it should. I am waiting for CI pipelines to go through to see it.
@JackCaoG I created a companion PR in xla with the fix to its failing test cases at https://github.com/pytorch/xla/pull/3982. Let me know how to get this merged, so the PR here...