xla issues

Mark num_outputs as out parameter in PJRT_Executable_OutputDimensions_Args Fixes #25211

2

Fixed #25211 by updating the C API header to explicitly mark num_outputs as an output parameter, clarifying the API contract. I also added a new C API test that calls...

grewalsk

Cleaned up coordination service key-value store.

copybara-service[bot]

Reverts 189d67b2eab0b53e65bb226cf48082db6f633abb

copybara-service[bot]

Move `TopKKernel` behind `GpuKernelRegistry`.

Move `TopKKernel` behind `GpuKernelRegistry`. * Moves `TopK` logic into `backends/gpu/runtime` since it's a runtime component. * Defines trait for the `TopK` kernel in `stream_executor/gpu/` * Moves the implementations of this...

copybara-service[bot]

Add TMA descriptor extraction to launcher

Add TMA descriptor extraction to launcher Port over the getTmaDesc function and refactor it to follow the extractor API Reenable pipeliner and experimental_tma tests which get fixed by this change

copybara-service[bot]

Automated Code Change

copybara-service[bot]

PR #24744: Don't clone instructions in HloEvaluator.

PR #24744: Don't clone instructions in HloEvaluator. Imported from GitHub PR https://github.com/openxla/xla/pull/24744 There doesn't appear to be any good reason to do this, but it makes HloEvaluator very dangerous to...

copybara-service[bot]

[XLA:GPU][TMA] Use the offsets-sizes-strides interface for triton_xla ops.

[XLA:GPU][TMA] Use the offsets-sizes-strides interface for triton_xla ops. That improves readability and we can remove our own parsing, printing and some verifications.

copybara-service[bot]

Ported profiler's code from pybind11 to nanobind + threadsafety in FT

Description: - Ported profiler's code from pybind11 to nanobind - Added nanobind mutexes to protect PythonHooks and PythonHookContext under free-threading - Race seen in JAX: https://github.com/jax-ml/jax/actions/runs/14651237353/job/41117294221?pr=28245#step:18:1561 ``` #8 xla::profiler::PythonHookContext::ProfileFast(_frame*, int,...

vfdev-5

[XLA:LatencyHidingScheduler] Split ReadySetLt::operator() into multiple functions

[XLA:LatencyHidingScheduler] Split ReadySetLt::operator() into multiple functions Split some of the heuristics into different functions to make them easier to read.

copybara-service[bot]

xla
xla copied to clipboard

Metadata

Mark num_outputs as out parameter in PJRT_Executable_OutputDimensions_Args Fixes #25211

Cleaned up coordination service key-value store.

Reverts 189d67b2eab0b53e65bb226cf48082db6f633abb

Move `TopKKernel` behind `GpuKernelRegistry`.

Add TMA descriptor extraction to launcher

Automated Code Change

PR #24744: Don't clone instructions in HloEvaluator.

[XLA:GPU][TMA] Use the offsets-sizes-strides interface for triton_xla ops.

Ported profiler's code from pybind11 to nanobind + threadsafety in FT

[XLA:LatencyHidingScheduler] Split ReadySetLt::operator() into multiple functions

← Metadata

Owner

Metadata

xla xla copied to clipboard

Metadata

← Metadata

Owner

Metadata

xla
xla copied to clipboard