xla issues

[ROCm] Fix build break introduced in ffa7bb5 and df736d7

1

Fix build break introduced in ffa7bb5 and df736d7

hsharsha

[XLA:GPU] selectively update command buffer opeators

shawnwang18

[XLA:GPU] Improve memory bandwidth utilization of column reduction

24

Column reduction support vectorization previously, removed after [this PR](https://github.com/openxla/xla/commit/72788f177dde61e1efbfd744435aa1985a4ff6c0). The reason of disable vectorization is that find no performance gain and the vectorization heuristic is fairly complex and different from...

lingzhi98

kokoro:force-run

Make it possible to lower fp8 `tt.splat`.

Make it possible to lower fp8 `tt.splat`. Before the fix, `tt.splat` was lowered to e.g. ``` %14 = "llvm.mlir.constant"() : () -> f8E4M3FNUZ ``` which LLVM rejected. Translating the result...

copybara-service[bot]

BF16 matmul slower than F32 matmul on T4 GPU

3

T4 GPU doesn't support BF16 matmul. Because of this, XLA switches BF16 matmul to F32 matmul on T4 (IIUC). This is obviously much slower, but it turns out it's actually...

sagelywizard

xla
xla copied to clipboard

Metadata

[ROCm] Fix build break introduced in ffa7bb5 and df736d7

[XLA:GPU] selectively update command buffer opeators

[XLA:GPU] Improve memory bandwidth utilization of column reduction

Make it possible to lower fp8 `tt.splat`.

BF16 matmul slower than F32 matmul on T4 GPU

Use absl::Status instead of xla::Status now that they're identical.

Use absl::Status instead of xla::Status now that they're identical.

Use absl::Status instead of xla::Status now that they're identical.

[XLA:GPU] Remove GpuStatus.

Use absl::Status instead of xla::Status now that they're identical.

← Metadata

Owner

Metadata

xla xla copied to clipboard

Metadata

← Metadata

Owner

Metadata

xla
xla copied to clipboard