xla
xla copied to clipboard
A machine learning compiler for GPUs, CPUs, and ML accelerators
This PR adds the availability to configure while loop unroll thresholds. Existing defaults are maintained. This PR also adds the option for the user to specify an HloPassPipeline that will...
Automated Code Change
Dear xla team, I integrated a CUDA custom-call operator in JAX. During the use of this operator, I encountered the CUDA_ERROR_ILLEGAL_ADDRESS error. I am confident that this CUDA_ERROR_ILLEGAL_ADDRESS does not...
Update users of `status_test_util` to use the new location in `xla/tsl`
Automated Code Change
Automated Code Change
Automated Code Change
Increases the range of values in the Mixed ILP's model of memory consumption from 100 to 1e6.
Enables FP8 windowed einsums with all-gathers that have multiple dot users by shifting the dequantization of the FP8 operands to the output of the while loop.
The current logic in nccl clique sets is_local to true by looking at the number of local participants and total devices in the clique. It's been used to determine if...