xla
xla copied to clipboard
A machine learning compiler for GPUs, CPUs, and ML accelerators
This patch constraints the dynamic-slice-fusion to only be triggered when the offset is either a constant or loop iteration offset that can guarentee that no D2H copy is required.
Specific tests will be fixed as we go
This PR aims to add sycl runtime support, we can run basic JAX GPU UTs with it. It includes: 1. sycl runtime crosstool build 2. sycl stream executor 3. spirv-llvm-translator...
./configure.py --backend=CUDA bazel build --test_output=all --spawn_strategy=sandboxed //xla/... when set platform cuda,TF_ASSERT_OK_AND_ASSIGN(se::Platform * platform, PlatformUtil::GetPlatform("cuda")); get an error:NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that...
In the decoding stage of some MOE model inferences, XLA squeezes dimensions of size 1 when sequence length is 1. For example, it transforms a shape of [1, 4096] into...
Automated Code Change
Automated Code Change
Implement MHLO->HLO conversion for entry parameter layout tiles. Does not support the following cases: * Parameters with nested layouts and tiles (e.g. tuple of tuples). * Multi result layouts and...
Integrate LLVM at llvm/llvm-project@070ce816dadb Updates LLVM usage to match [070ce816dadb](https://github.com/llvm/llvm-project/commit/070ce816dadb)