gloo
gloo copied to clipboard
Collective communications library with various primitives for multi-machine training.
- Changes to control hipify of CUDA_VERSION to HIP_VERSION - use GLOO_USE_ROCM instead of __HIP_PLATFORM_HCC__ - Adding __HIP_PLATFORM_AMD__ since __HIP_PLATFORM_HCC__ is being deprecated.
Summary: MultiProc tests does not do multiprocessing error catching thoroughly. This diff plugs some of the holes and includes better logging upon failures. Differential Revision: D26186660
When trying to build the lib on ubuntu with cmake using clang++-11 with libc++, the following error occurs: /home/lib/pytorch/third_party/gloo/gloo/transport/tcp/device.cc:152:39: error: implicit instantiation of undefined template 'std::__1::array' std::array hostname; ^ /usr/lib/llvm-10/bin/../include/c++/v1/__tuple:219:64:...
This clears the warning: CMake Warning: The package name passed to `find_package_handle_standard_args` (RCCL) does not match the name of the calling package (rccl). This can lead to problems in calling...
Summary: Add alltoall and alltoallv to Gloo Differential Revision: D21873282
to avoid collision with variable in RCCL cmake file. This should fix the error about not finding "-lrccl" in https://github.com/pytorch/pytorch/pull/31341 (now refiled as https://github.com/pytorch/pytorch/pull/34683)
These were disabled in #230 because they all fail when running consecutively. When run independently, they appear to pass...
The NVLink cube mesh architecture has partial peer access between devices. Two groups of 4 GPUs have full peer access and every GPU in one group has peer access to...
For Gloo in Pytorch distributed, as shown in this document https://pytorch.org/docs/stable/distributed.html, will the following code get performance benefits of using CUDA-aware MPI? (e.g., GPU-to-GPU transferring via PCIe while bypassing CPU)...