xla icon indicating copy to clipboard operation
xla copied to clipboard

A machine learning compiler for GPUs, CPUs, and ML accelerators

Results 653 xla issues
Sort by recently updated
recently updated
newest added

Adds more doc to explain the transformations that the gpuWindowedEinsum does for optimizing compute-comm overlap.

A previous PR https://github.com/openxla/xla/pull/15170 adds a python binding for accessing the profiled instruction. Actually the API previously added contains repeated logic with `get_fdo_profile`, which works very similar as the added...

The windowed einsum loops used to be unrolled by a factor of 2 to achieve overlap between 2 gemms. But that leaves some of the dynamic update slices at the...

[XLA:GPU] Support mocking multi-gpu execution from a single GPU Allows running multi-GPU HLOs on a machine with a single GPU, when `--enable_mock_nccl` is used. Example usage: ``` ./tools/multihost_hlo_runner/hlo_runner_main --device_type=gpu --use_spmd_partitioning...

[XLA] Remove unneeded backend tags The fusion test should be run in OSS and it does not require 2GPUs.

[XLA] [NFC] Fix error handling for service creation

Add jax_test configs for shardy and enable it for pjit_test.py and fix any tests. Tests fixed include: - `test_globally_sharded_key_array_8x4_multi_device` - Issue was in `replicate_trailing_dims` where an `xc.OpSharding` was always created....

Integrate LLVM at llvm/llvm-project@d3c9bb0cf811 Updates LLVM usage to match [d3c9bb0cf811](https://github.com/llvm/llvm-project/commit/d3c9bb0cf811)