mrshenli issues

Results 7 issues of


mrshenli

[WIP] Make allreduce compatible with fx ProxyTensor

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #81930 land after #83122 This PR explores solutions for 2 issues: 1. Collective comm ops are inplace ops, and does not return...

oncall: distributed

cla signed

Add rpc and dist autograd examples

Creating this issue for tracking purpose. Example details are TBD. It could include some popular large NLP models. @pritamdamania87 @aazzolini

enhancement

distributed

Allow jagged_index_select backward to accept pre-computed output shape

Summary: Save `num_dense_output_rows` computed during the forward pass and use it to avoid blocking `.item()` call during backward. Differential Revision: D54173841

fb-exported

cla signed

Avoid optional::value_or to skip materializing .item() when possible

Differential Revision: D54173842

fb-exported

cla signed

Allow jagged_index_select to accept pre-computed output shape

Summary: `jagged_index_select`'s CPU kernel API already accepts `num_dense_output_rows` as an argument. Generalize this to the CUDA kernel as well, which can to avoid a CPU-blocking `.item()` call in the CUDA...

fb-exported

cla signed

SHM transport cannot scale to more than 18 processes on the same machine

See discussions in the following post: https://discuss.pytorch.org/t/rpc-behavior-difference-between-pytorch-1-7-0-vs-1-9-0/124772/5

Gloo does not support empty inputs on reduce/allreduce/allgather

Reduce and Allreduce ops apply sanity check to enforce non-empty inputs [[here](https://github.com/facebookincubator/gloo/blob/master/gloo/allreduce.cc#L95)]. Allgather returns error code 8 on empty inputs. Does it make sense to support empty inputs in these...