xla
xla copied to clipboard
[XLA:GPU] Custom kernel for small sum reductions that is intended to run faster than NCCL.
[XLA:GPU] Custom kernel for small sum reductions that is intended to run faster than NCCL.