xla icon indicating copy to clipboard operation
xla copied to clipboard

A machine learning compiler for GPUs, CPUs, and ML accelerators

Results 653 xla issues
Sort by recently updated
recently updated
newest added

Fixed #25211 by updating the C API header to explicitly mark num_outputs as an output parameter, clarifying the API contract. I also added a new C API test that calls...

Move `TopKKernel` behind `GpuKernelRegistry`. * Moves `TopK` logic into `backends/gpu/runtime` since it's a runtime component. * Defines trait for the `TopK` kernel in `stream_executor/gpu/` * Moves the implementations of this...

Add TMA descriptor extraction to launcher Port over the getTmaDesc function and refactor it to follow the extractor API Reenable pipeliner and experimental_tma tests which get fixed by this change

PR #24744: Don't clone instructions in HloEvaluator. Imported from GitHub PR https://github.com/openxla/xla/pull/24744 There doesn't appear to be any good reason to do this, but it makes HloEvaluator very dangerous to...

[XLA:GPU][TMA] Use the offsets-sizes-strides interface for triton_xla ops. That improves readability and we can remove our own parsing, printing and some verifications.

Description: - Ported profiler's code from pybind11 to nanobind - Added nanobind mutexes to protect PythonHooks and PythonHookContext under free-threading - Race seen in JAX: https://github.com/jax-ml/jax/actions/runs/14651237353/job/41117294221?pr=28245#step:18:1561 ``` #8 xla::profiler::PythonHookContext::ProfileFast(_frame*, int,...

[XLA:LatencyHidingScheduler] Split ReadySetLt::operator() into multiple functions Split some of the heuristics into different functions to make them easier to read.