xla
xla copied to clipboard
A machine learning compiler for GPUs, CPUs, and ML accelerators
[XLA:GatherScatter] add support for gather/scatter batching dims in MHLOHLO conversions
[XLA] [NFC] Remove unused memory_by_computation map It always starts empty for all callers and is then passed by constant reference.
Convert row reduction tests to hlo tests. This unifies the three types of tests we have right now (IR, correctness, indexing) using two tools: one that converts the HLO to...
Add cudnn frontend support of scaled dot product attention for FP8 forward. doc [here](https://github.com/NVIDIA/cudnn-frontend/blob/98ca4e1941fe3263f128f74f10063a3ea35c7019/docs/operations/Attention.md).
Right now collective pipeliner will filter user type when determining if a value can be pushed to the next iteration of the loop. Bitcast is not in the acceptable users,...
I am seeing very strange sharding with pipeline parallel and tensor, data parallel. Below is the HLO exactly before partitioning: ``` while.9466 = (s32[], bf16[4,128,512]{2,1,0}, bf16[4,128,512]{2,1,0}, bf16[4,512,128]{2,1,0}, bf16[4,128]{1,0}, /*index=5*/bf16[4,3,128,32,4]{4,3,2,1,0}, bf16[4,128,32,4]{3,2,1,0},...
[XLA:GPU] Fix order depended tests in dynamic_slice_fusion_test.cc
PR #15417: Add while loop config options and optional pass pipeline immediately before unroll. Imported from GitHub PR https://github.com/openxla/xla/pull/15417 This PR adds the availability to configure while loop unroll thresholds....
I'm not 100% that I'm doing the right thing but I'll just say that after this, I got rid of some compilation errors on Windows and the linker seems to...