xla issues

[NFC] Added more doc to explain what gpu windowed einsum handler does

2

Adds more doc to explain the transformations that the gpuWindowedEinsum does for optimizing compute-comm overlap.

A better python API for accessing profiled instructions

3

A previous PR https://github.com/openxla/xla/pull/15170 adds a python binding for accessing the profiled instruction. Actually the API previously added contains repeated logic with `get_fdo_profile`, which works very similar as the added...

yliu120

[NVIDIA GPU] Fully unroll windowed einsum loops to hide DUS overheads

6

The windowed einsum loops used to be unrolled by a factor of 2 to achieve overlap between 2 gemms. But that leaves some of the dynamic update slices at the...

Tixxx

[XLA:GPU] Support mocking multi-gpu execution from a single GPU

[XLA:GPU] Support mocking multi-gpu execution from a single GPU Allows running multi-GPU HLOs on a machine with a single GPU, when `--enable_mock_nccl` is used. Example usage: ``` ./tools/multihost_hlo_runner/hlo_runner_main --device_type=gpu --use_spmd_partitioning...

copybara-service[bot]

xla
xla copied to clipboard

Metadata

[NFC] Added more doc to explain what gpu windowed einsum handler does

A better python API for accessing profiled instructions

[NVIDIA GPU] Fully unroll windowed einsum loops to hide DUS overheads

[XLA:GPU] Support mocking multi-gpu execution from a single GPU

[XLA] Remove unneeded backend tags

[XLA] [NFC] Remove dead line

[XLA] [NFC] Remove extra indentation

[XLA] [NFC] Fix error handling for service creation

Add jax_test configs for shardy and enable it for pjit_test.py and fix any tests.

Integrate LLVM at llvm/llvm-project@d3c9bb0cf811

← Metadata

Owner

Metadata

xla xla copied to clipboard

Metadata

← Metadata

Owner

Metadata

xla
xla copied to clipboard