mrshenli

Results 7 issues of mrshenli

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #81930 land after #83122 This PR explores solutions for 2 issues: 1. Collective comm ops are inplace ops, and does not return...

oncall: distributed
cla signed
fx

Creating this issue for tracking purpose. Example details are TBD. It could include some popular large NLP models. @pritamdamania87 @aazzolini

enhancement
distributed

Summary: Save `num_dense_output_rows` computed during the forward pass and use it to avoid blocking `.item()` call during backward. Differential Revision: D54173841

fb-exported
cla signed

Differential Revision: D54173842

fb-exported
cla signed

Summary: `jagged_index_select`'s CPU kernel API already accepts `num_dense_output_rows` as an argument. Generalize this to the CUDA kernel as well, which can to avoid a CPU-blocking `.item()` call in the CUDA...

fb-exported
cla signed

See discussions in the following post: https://discuss.pytorch.org/t/rpc-behavior-difference-between-pytorch-1-7-0-vs-1-9-0/124772/5

Reduce and Allreduce ops apply sanity check to enforce non-empty inputs [[here](https://github.com/facebookincubator/gloo/blob/master/gloo/allreduce.cc#L95)]. Allgather returns error code 8 on empty inputs. Does it make sense to support empty inputs in these...