gloo icon indicating copy to clipboard operation
gloo copied to clipboard

Collective communications library with various primitives for multi-machine training.

Results 90 gloo issues
Sort by recently updated
recently updated
newest added

cannot pass nvcc(cuda) flags to gloo by environment variable, such as '-D_GLIBCXX_USE_CXX11_ABI=0'. It's not convenient when compile gloo as third party, such as when compiling PyTorch.

Hi Friends, I am experimenting with the GLOO async `isend` and `irecv` in my work on pipeline parallelism. With `torch==1.8.1` on macOS, I will get an error `libc++abi.dylib: terminating with...

I found that every time TCP connection with random port in Gloo ,But this condition is not suitable for mine . In my work, I need to open specific port...

enhancement

Hi, we are trying to enable the ibverbs & make it work with RoCEv2 (e.g., using both librdma_cm & libibverbs). However, we cannot successfully compile the source code. Our question...

Suppose I have 3 machines and each machine has 8 network NICs. Does Gloo for collective communication use all the network NICs in one process(use IB not tcp)? (For example,...

I execute the following command: `./benchmark --size 2 --rank 0 --redis-host host-ip --redis-port 6379 --prefix hey --transport tcp --elements 1 --iteration-time 1s allreduce_ring_chunked` I get the following error: ` terminate...

I change the gloo/examples/example1.cpp from using "tcp" to "uv". However, it fails at Pair::createSendBuffer in gloo/transport/uv/pair.h. This function is not implemented and only has "abort()" in it. I find that...

When opening a connection with`init_process_group` from a local (Mac) host a remote (Ubuntu) host, I get the following error on the remote host, which closes the connection: `RuntimeError: [/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:210] address...

In PR https://github.com/facebookincubator/gloo/pull/264 In the future aligned_allocator can be implemented using _aligned_malloc and _aligned_free on Windows, since it doesn't have posix_memalign.

Code review comment from Orvid in PR https://github.com/facebookincubator/gloo/pull/264 Just make this explicitly an unsigned long long. long is 64-bit on linux, but 32-bit on windows, so it's far better to...