Ke Wen

Results 36 issues of Ke Wen

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #138400 `all_gather_object` and `gather_object` have been tested in `test_c10d_nccl.py` and `test_c10d_object_collective.py`. Removing this third set. cc @XilunWu @H-Huang @awgu @wanchaol @fegin @fduwjj...

oncall: distributed
topic: not user facing

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #137544 * __->__ #138384 * #138374 * #137855 Previously we only wait for comm to become ready after its initialization. But that's not...

oncall: distributed
release notes: distributed (c10d)

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #137544 * #138384 * #138374 * #137855 Resolves RFC https://github.com/pytorch/pytorch/issues/137007. Changelist: - Set default value of `nccl_use_nonblocking` to true (previous: false). cc...

oncall: distributed
release notes: distributed (c10d)
keep-going

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #137544 * #138384 * __->__ #138374 * #137855 - Added default value for `nccl_nonblocking_timeout` (30 mins, previous: -1). - Reuse C10D_CHECK_TIMEOUT in other...

oncall: distributed
release notes: distributed (c10d)

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #141192 Adding `destroy_pg_upon_exit` property to allow derived Test classes to control whether auto destroy is desired. (Otherwise, derived test classes will need...

oncall: distributed
release notes: distributed (c10d)
topic: not user facing

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #141168 Pulling a PR to test viability. Today's timeout is 300s, which could waste quite some machine time if a hang happens...

oncall: distributed
ciflow/trunk
topic: not user facing
ciflow/periodic
test-config/distributed
test-config/multigpu