xla issues

Add while loop config options and optional pass pipeline immediately before unroll.

6

This PR adds the availability to configure while loop unroll thresholds. Existing defaults are maintained. This PR also adds the option for the user to specify an HloPassPipeline that will...

patrick-toulme

Automated Code Change

copybara-service[bot]

Panic when do custom_call for gpu cuda

Dear xla team， I integrated a CUDA custom-call operator in JAX. During the use of this operator, I encountered the CUDA_ERROR_ILLEGAL_ADDRESS error. I am confident that this CUDA_ERROR_ILLEGAL_ADDRESS does not...

knightXun

Increases the range of values in the Mixed ILP's model of memory consumption from 100 to 1e6.

copybara-service[bot]

FP8 Windowed Einsums with Multiple All-Gather Dots

11

Enables FP8 windowed einsums with all-gathers that have multiple dot users by shifting the dequantization of the FP8 operands to the output of the while loop.

philipphack

[NVIDIA GPU] Use cuda runtime api to determine if 2 ranks are on the same host

1

The current logic in nccl clique sets is_local to true by looking at the number of local participants and total devices in the clique. It's been used to determine if...

Tixxx

xla
xla copied to clipboard

Metadata

Add while loop config options and optional pass pipeline immediately before unroll.

Automated Code Change

Panic when do custom_call for gpu cuda

Update users of `status_test_util` to use the new location in `xla/tsl`

Automated Code Change

Automated Code Change

Automated Code Change

Increases the range of values in the Mixed ILP's model of memory consumption from 100 to 1e6.

FP8 Windowed Einsums with Multiple All-Gather Dots

[NVIDIA GPU] Use cuda runtime api to determine if 2 ranks are on the same host

← Metadata

Owner

Metadata

xla xla copied to clipboard

Metadata

← Metadata

Owner

Metadata

xla
xla copied to clipboard