xla
xla copied to clipboard
A machine learning compiler for GPUs, CPUs, and ML accelerators
[XLA] Change reduces reducing 2nd minor size-1 dimension into a less expensive reshape.
[XLA:GPU] Cleanup: Use the new calling convention for non-fusion operations too and remove the old one Reminder about the new calling convention: - We pass all arguments / output buffers...
[XLA] Code cleanups that also reduce code size.
Generalize copy elimination pattern. The current pattern only triggers when 1) the src is deallocated immediately after the copy and 2) the dst is allocated in the program. 1) is...
Add documentation for the deallocation/buffer-reuse passes.
[PJRT:C] Retrieve and return C API callback error message to the caller.
[xla:cpu] Enable fusion of degenerating reshape ops
Add SavedModel to StableHLO Converter to TensorFlow pip package
Express tests tolerances for exhaustive tests relative to the accuracy and subnormal boundary of the data type. The following table shows the default absolute and relative error tolerances for each...
[XLA:GPU] Error out early on missing autotuning cache when required This is technically a performance optimization: we would still error out later on otherwise, but after recompiling & running a...