xla
xla copied to clipboard
ZeRO1: Add bucketting logic to control the size of tensors for all-gather/reduce-scatter
This PR updates XLA ZeRO1 implementation to use allgather coalesed and reduce-scatter coalesced.