Results 56 issues of Pieter Noordhuis

These were disabled in #230 because they all fail when running consecutively. When run independently, they appear to pass...

CLA Signed

The NVLink cube mesh architecture has partial peer access between devices. Two groups of 4 GPUs have full peer access and every GPU in one group has peer access to...

CLA Signed

Stack from [ghstack](https://github.com/ezyang/ghstack): * **#243 Use a single listening socket per device** * #242 Add error class * #241 Add RAII wrapper for socket * #240 Allow deferring functions to...

CLA Signed

Stack from [ghstack](https://github.com/ezyang/ghstack): * #243 Use a single listening socket per device * #242 Add error class * #241 Add RAII wrapper for socket * **#240 Allow deferring functions to...

CLA Signed

Stack from [ghstack](https://github.com/ezyang/ghstack): * #243 Use a single listening socket per device * #242 Add error class * **#241 Add RAII wrapper for socket** * #240 Allow deferring functions to...

CLA Signed

Stack from [ghstack](https://github.com/ezyang/ghstack): * #243 Use a single listening socket per device * **#242 Add error class** * #241 Add RAII wrapper for socket * #240 Allow deferring functions to...

CLA Signed

Per @jjlilley in https://github.com/facebookincubator/gloo/pull/237#discussion_r356780531, we can use an `eventfd(2)` to avoid busy-spinning the epoll loop. If we do, we must also update the code that unregisters an fd to either:...

The `notify_send_ready` and `notify_recv_ready` messages used in the tcp backend (and future uv backend, see #195) need better documentation. The protocol how these are sequenced as well.

This is what we do in PyTorch upstream today but it would be good to move the functionality into Gloo. This would be a new type of context that wraps...

enhancement

The comments mention it is usable for any `#nodes == c * base ^ x`, for any `c >= 1`, `base >= 2`, and `x >= 1`, but in reality...