xla icon indicating copy to clipboard operation
xla copied to clipboard

A machine learning compiler for GPUs, CPUs, and ML accelerators

Results 653 xla issues
Sort by recently updated
recently updated
newest added
trafficstars

[Autotuner] Make buffer checking best effort, rather than forcing it. - There are cases in gemm_fusion_autotuner where we don't have a reference output from cuBLAS and we skip the requested...

[IFRT Proxy]Make `ifrt_proxy::client::LoadedExecutable` implement `MpmdLoadedExecutableInterface`. This change updates the `LoadedExecutable` class in the IFRT proxy client to inherit from `xla::ifrt::MpmdLoadedExecutableInterface` and adds declarations for the MPMD-specific methods.

[XLA:GPU] enable dynamic slice support replace usages of legacy IsTritonSupportedDynamicSlice

Move CustomKernelThunk into its own file CustomKernelThunk is currently declared in kernel_thunk.h and this change moves it into its own file custom_kernel_thunk.h. The same is done for the implementation (kernel_thunk.cc...

KernelSpecTest improvements and cleanups - Improves how we invent pointers to CUDA kernels - Adds parameter comments for ambigious parameters - Makes use of `ParseTextProtoOrDie`

📝 Summary of Changes - Adding a heurisitic to GPU-scheduler for having better MoveToHost overlapping. 🎯 Justification This could help hide D2H/H2D data movement behind computations. 🚀 Kind of Contribution...

Enable f32 dots by default in YNNPACK We expect this to be a small speedup of f32 dots by wall clock time, but a significant improvement in CPU time (~30%)....

Tensorflow version 2.19 Python version 3.10 Bazel version 6.5.0 GCC compiler version 15.2.0 CUDA and cuDNN version 12.6.1 9.4.0 Rocm version 6.2.0 LLVM 18.1.8(system side) LLVM Rocm 18.0.0git GPU model...

err:Build

📝 Summary of Changes - Addin a knob to control the limitation of async-compute resource. This switch provides ample flexibility for control, enabling more asynchronous computations to execute concurrently. In...