gloo
gloo copied to clipboard
Collective communications library with various primitives for multi-machine training.
I noticed that the constructor does not take a reduction function, and there's no way to set it. https://github.com/facebookincubator/gloo/blob/master/gloo/cuda_allreduce_halving_doubling.h#L75 Is this intended?
Doesn't work at the moment...
This would verify that we copy all the headers we need to copy in the install step.
Otherwise REALLY weird errors pop up (see for example https://github.com/pytorch/pytorch/issues/2835)
Hi! Firstly, thanks for the nice work. It's good to see the brief benchmark figures in README.md. It would be great if anybody can show the benchmarking result of `--transport...
Do this instead of assuming hostname(2) is resolvable. This is typically not the case on people's custom Ubuntu installs and whatnot. We keep the API the same but just change...
Two computers, one is ubantu and the other is win11. Pytorch 2.3.0 is used for distributed training model. Since win11 does not support nccl mode, gloo is used, but the...
Hi, Can Gloo use libfabric ? I see it has ibverbs to be used as transport? why not libfabric to allow all types of transport?
Hi, all. I am the maintainer of vcpkg. Recently we received a build error regarding gloo. https://github.com/microsoft/vcpkg/issues/38852 **Reproduce**: ``` git clone https://github.com/microsoft/vcpkg cd vcpkg/ ./vcpkg install gloo:x64-linux ``` **Error**: ```...
cstdint for uint8_t need to be included explicitly when compiling with GCC 15