xla
xla copied to clipboard
A machine learning compiler for GPUs, CPUs, and ML accelerators
Introduce hermetic CUDA in Google ML projects. 1) Hermetic CUDA rules allow building wheels with GPU support on a machine without GPUs, as well as running Bazel GPU tests on...
Always use std::array. `Eigen::array` is being removed upstream in favor of `std::array`.
Minor change to add logic for finding all lines with same id.
[XLA:GPU] Run autotuner with cublas config only if `--xla_gpu_cublas_fallback=true`. Currently we always compile in cublas by default and only later drop it from possible list of configs if the flag...
Affects only Hopper+ and cuDNN 9+: https://github.com/openxla/xla/blob/fb41b76a8b08216b80abb49ceb5c07373d9c45c5/xla/service/gpu/gemm_fusion_autotuner.cc#L556. Description of fusion level 1: https://github.com/openxla/xla/blob/fb41b76a8b08216b80abb49ceb5c07373d9c45c5/xla/xla.proto#L742.
[XLA:GPU] Pass the CUDA / ROCm toolkit version explicitly for autotuning and GEMM rewriting. This allows to remove more `#if GOOGLE_CUDA` preprocessor directives from HLO passes.
There exist bug in ComputeThreadIdToOutputIndexing func. Currently, this func can not calculate indexing map correctly for sided output. Fix it and add corresponding test.
Add build macro to generate hlo compilations test build rules.