xla issues

[Autotuner] Make buffer checking best effort, rather than forcing it.

[Autotuner] Make buffer checking best effort, rather than forcing it. - There are cases in gemm_fusion_autotuner where we don't have a reference output from cuBLAS and we skip the requested...

copybara-service[bot]

[IFRT Proxy]Make `ifrt_proxy::client::LoadedExecutable` implement `MpmdLoadedExecutableInterface`.

[IFRT Proxy]Make `ifrt_proxy::client::LoadedExecutable` implement `MpmdLoadedExecutableInterface`. This change updates the `LoadedExecutable` class in the IFRT proxy client to inherit from `xla::ifrt::MpmdLoadedExecutableInterface` and adds declarations for the MPMD-specific methods.

copybara-service[bot]

[XLA:GPU] enable dynamic slice support

[XLA:GPU] enable dynamic slice support replace usages of legacy IsTritonSupportedDynamicSlice

copybara-service[bot]

Reverts 9cf521108c2b54328e973c05bd19941c476a5c3c

copybara-service[bot]

Move CustomKernelThunk into its own file

Move CustomKernelThunk into its own file CustomKernelThunk is currently declared in kernel_thunk.h and this change moves it into its own file custom_kernel_thunk.h. The same is done for the implementation (kernel_thunk.cc...

copybara-service[bot]

KernelSpecTest improvements and cleanups

KernelSpecTest improvements and cleanups - Improves how we invent pointers to CUDA kernels - Adds parameter comments for ambigious parameters - Makes use of `ParseTextProtoOrDie`

copybara-service[bot]

Adding a delayMoveToHost heuristic to LHS and related tests.

2

📝 Summary of Changes - Adding a heurisitic to GPU-scheduler for having better MoveToHost overlapping. 🎯 Justification This could help hide D2H/H2D data movement behind computations. 🚀 Kind of Contribution...

mingxu1067

Enable f32 dots by default in YNNPACK

Enable f32 dots by default in YNNPACK We expect this to be a small speedup of f32 dots by wall clock time, but a significant improvement in CPU time (~30%)....

copybara-service[bot]

failed: undeclared inclusion(s) in rule '//tensorflow/core/grappler:devices': this rule is missing dependency declarations for the following files included by 'tensorflow/core/grappler/devices.cc'

1

Tensorflow version 2.19 Python version 3.10 Bazel version 6.5.0 GCC compiler version 15.2.0 CUDA and cuDNN version 12.6.1 9.4.0 Rocm version 6.2.0 LLVM 18.1.8(system side) LLVM Rocm 18.0.0git GPU model...

0xra0

err:Build

Addin a knob to control the limitation of async-compute resource.

2

📝 Summary of Changes - Addin a knob to control the limitation of async-compute resource. This switch provides ample flexibility for control, enabling more asynchronous computations to execute concurrently. In...

mingxu1067

xla
xla copied to clipboard

Metadata

[Autotuner] Make buffer checking best effort, rather than forcing it.

[IFRT Proxy]Make `ifrt_proxy::client::LoadedExecutable` implement `MpmdLoadedExecutableInterface`.

[XLA:GPU] enable dynamic slice support

Reverts 9cf521108c2b54328e973c05bd19941c476a5c3c

Move CustomKernelThunk into its own file

KernelSpecTest improvements and cleanups

Adding a delayMoveToHost heuristic to LHS and related tests.

Enable f32 dots by default in YNNPACK

failed: undeclared inclusion(s) in rule '//tensorflow/core/grappler:devices': this rule is missing dependency declarations for the following files included by 'tensorflow/core/grappler/devices.cc'

Addin a knob to control the limitation of async-compute resource.

← Metadata

Owner

Metadata

xla xla copied to clipboard

Metadata

← Metadata

Owner

Metadata

xla
xla copied to clipboard