xla
xla copied to clipboard
A machine learning compiler for GPUs, CPUs, and ML accelerators
Adds more doc to explain the transformations that the gpuWindowedEinsum does for optimizing compute-comm overlap.
A previous PR https://github.com/openxla/xla/pull/15170 adds a python binding for accessing the profiled instruction. Actually the API previously added contains repeated logic with `get_fdo_profile`, which works very similar as the added...
The windowed einsum loops used to be unrolled by a factor of 2 to achieve overlap between 2 gemms. But that leaves some of the dynamic update slices at the...
[XLA:GPU] Support mocking multi-gpu execution from a single GPU Allows running multi-GPU HLOs on a machine with a single GPU, when `--enable_mock_nccl` is used. Example usage: ``` ./tools/multihost_hlo_runner/hlo_runner_main --device_type=gpu --use_spmd_partitioning...
[XLA] Remove unneeded backend tags The fusion test should be run in OSS and it does not require 2GPUs.
[XLA] [NFC] Remove dead line
[XLA] [NFC] Remove extra indentation
[XLA] [NFC] Fix error handling for service creation
Add jax_test configs for shardy and enable it for pjit_test.py and fix any tests. Tests fixed include: - `test_globally_sharded_key_array_8x4_multi_device` - Issue was in `replicate_trailing_dims` where an `xc.OpSharding` was always created....
Integrate LLVM at llvm/llvm-project@d3c9bb0cf811 Updates LLVM usage to match [d3c9bb0cf811](https://github.com/llvm/llvm-project/commit/d3c9bb0cf811)