xla issues

[XLA:GatherScatter] add support for gather/scatter batching dims in MHLO<->HLO conversions

[XLA:GatherScatter] add support for gather/scatter batching dims in MHLOHLO conversions

[XLA] [NFC] Remove unused memory_by_computation map

[XLA] [NFC] Remove unused memory_by_computation map It always starts empty for all callers and is then passed by constant reference.

copybara-service[bot]

Convert row reduction tests to hlo tests.

Convert row reduction tests to hlo tests. This unifies the three types of tests we have right now (IR, correctness, indexing) using two tools: one that converts the HLO to...

copybara-service[bot]

Support cuDNN frontend scaled dot product attention for FP8. Part- 1(forward)

Add cudnn frontend support of scaled dot product attention for FP8 forward. doc [here](https://github.com/NVIDIA/cudnn-frontend/blob/98ca4e1941fe3263f128f74f10063a3ea35c7019/docs/operations/Attention.md).

wenscarl

[NVIDIA GPU] Add Bitcast to collective pipeliner acceptable users

3

Right now collective pipeliner will filter user type when determining if a value can be pushed to the next iteration of the loop. Bitcast is not in the acceptable users,...

terryysun

[XLA:GPU]add sycl_gpu_runtime and implement it to py_clinet_gpu

1

mayuyuace

Strange Sharding with pipeline parallel around Scatter op - TP/PP/DP

3

I am seeing very strange sharding with pipeline parallel and tensor, data parallel. Below is the HLO exactly before partitioning: ``` while.9466 = (s32[], bf16[4,128,512]{2,1,0}, bf16[4,128,512]{2,1,0}, bf16[4,512,128]{2,1,0}, bf16[4,128]{1,0}, /*index=5*/bf16[4,3,128,32,4]{4,3,2,1,0}, bf16[4,128,32,4]{3,2,1,0},...

patrick-toulme

eaplatanios

xla
xla copied to clipboard

Metadata

[XLA:GatherScatter] add support for gather/scatter batching dims in MHLO<->HLO conversions

[XLA] [NFC] Remove unused memory_by_computation map

Convert row reduction tests to hlo tests.

Support cuDNN frontend scaled dot product attention for FP8. Part- 1(forward)

[NVIDIA GPU] Add Bitcast to collective pipeliner acceptable users

[XLA:GPU]add sycl_gpu_runtime and implement it to py_clinet_gpu

Strange Sharding with pipeline parallel around Scatter op - TP/PP/DP

[XLA:GPU] Fix order depended tests in dynamic_slice_fusion_test.cc

PR #15417: Add while loop config options and optional pass pipeline immediately before unroll.

Added support for compiling the CUDA stubs on Windows.

← Metadata

Owner

Metadata

xla xla copied to clipboard

Metadata

← Metadata

Owner

Metadata

xla
xla copied to clipboard