xla
xla copied to clipboard
A machine learning compiler for GPUs, CPUs, and ML accelerators
[mlir][hlo][sparse] override the inferred dense return type from xla::ConcatInDim when requested a sparse return value.
Fold MapOp to one of its ContractionOp operands Given a MapOp that adds a ContractionOp to some other op A, fold the MapOp by making A the init operand of...
This relates to the JAX issue [#14655](https://github.com/google/jax/issues/14655): copying in various details from that thread below. I've got a use case where I'd like to store the nonzero entries of a...
[XLA:GPU] Remove unecessary upcasts and downcasts just for an add
[XLA:GPU] Custom kernel for small sum reductions that is intended to run faster than NCCL.
Internal CI testing.
Scalarize scatterOp during tiling if the tile_size=1
[XLA:SPMD] Factor out utility functions from SPMD to be used elsewhere.
Use std::optional instead of llvm::Optional Note that llvm::Optional is just an alias for std::optional these days and have since been deprecated upstream in favor of std::optional.
With the latest implementation of Latency Hiding Scheduling, we observe that most weight gradient all-reduce latency is still exposed. [(ref slide 6 and 7 at here)](https://docs.google.com/presentation/d/1s2B4DPuhOVQbJ4SAZA7XWBKL5ST-Dfcn/edit#slide=id.g1895a52e93e_0_0) Here is a brief...