xla issues

[XLA:CPU] Verify invariant buffers of `KernelThunk` in the runtime.

1

[XLA:CPU] Verify invariant buffers of `KernelThunk` in the runtime.

copybara-service[bot]

Move FindInstruction and FindComputation core functionality from hlo_test_base to hlo_query

copybara-service[bot]

[GPU] Use CuDnnThunk for FMHA.

2

CuDnnThunk currently used for GEMM fusions is capable of executing arbitrary cuDNN graphs. Moving FMHA to use it lets remove lots of specialized runtime code. The overview of the change...

sergachev

[PJRT IFRT] Pass distributed client into the PJRT IFRT layer for TPU (already done for CPU, GPU will be a separate CL).

[PJRT IFRT] Pass distributed client into the PJRT IFRT layer for TPU (already done for CPU, GPU will be a separate CL). Objective: Let IFRT handle topology exchange and other...

copybara-service[bot]

Automated Code Change

copybara-service[bot]

[XLA:GPU] Stable ordering of keys in gemm+DS rewriter

copybara-service[bot]

[XLA:GPU] Add a method to get all constraints for variables in an indexing map.

[XLA:GPU] Add a method to get all constraints for variables in an indexing map. This will allow us to only iterate over constraints in an indexing map once.

copybara-service[bot]

[XLA:GPU] Introduce the `TiledHloFusionInstruction` class.

[XLA:GPU] Introduce the `TiledHloFusionInstruction` class. It is to `TiledHloInstruction` what `HloFusionInstruction` is to `HloInstruction`. Its main purpose will be to wrap nested fusions for block-level code generation.

copybara-service[bot]

[XLA:GPU] Implement fusing int4 parameters into Triton dots.

[XLA:GPU] Implement fusing int4 parameters into Triton dots. Right now it works for the simple case where S4 is LHS argument and the contracting dim is minor(1) or not minor(0).

copybara-service[bot]

[XLA:GPU] Simplify the C64 cuBLASlt matrix dimension check.

[XLA:GPU] Simplify the C64 cuBLASlt matrix dimension check. Simplify the check that for C64, the non-contracting dimension that is fed to cuBLASlt is short enough. 1. Removed MatrixIsColumnMajor() function (which...

copybara-service[bot]

xla
xla copied to clipboard

Metadata

[XLA:CPU] Verify invariant buffers of `KernelThunk` in the runtime.

Move FindInstruction and FindComputation core functionality from hlo_test_base to hlo_query

[GPU] Use CuDnnThunk for FMHA.

[PJRT IFRT] Pass distributed client into the PJRT IFRT layer for TPU (already done for CPU, GPU will be a separate CL).

Automated Code Change

[XLA:GPU] Stable ordering of keys in gemm+DS rewriter

[XLA:GPU] Add a method to get all constraints for variables in an indexing map.

[XLA:GPU] Introduce the `TiledHloFusionInstruction` class.

[XLA:GPU] Implement fusing int4 parameters into Triton dots.

[XLA:GPU] Simplify the C64 cuBLASlt matrix dimension check.

← Metadata

Owner

Metadata

xla xla copied to clipboard

Metadata

← Metadata

Owner

Metadata

xla
xla copied to clipboard