xla issues

[XLA:GPU] Use cuGraphInstantiateWithParams to instantiate cuda graphs for better debug.

This PR uses cuGraphInstantiateWithParms instead of cuGraphInstantiate to instantiate cuda graph executors, so current command_buffer_cmd_test and command_buffer_thunk_test should cover the changes in this PR.

shawnwang18

[GPU] Expose caching DebugOptions to python

2

Adds python bindings for `xla_gpu_kernel_cache_file`, `xla_gpu_enable_llvm_module_compilation_parallelism` and `xla_gpu_per_fusion_autotune_cache_dir`. We would like to add some convenience features to JAX which will enable all caches with one flag/option (will open PR for...

trevor-m

Run build cleaner tooling on StableHLO

copybara-service[bot]

#sdy remove IdentityOp as it's no longer needed.

copybara-service[bot]

Support cuDNN frontend scaled dot product attention for FP8. Part- 2(backward)

3

As the 2nd part of #15092. NOTE: this feature relies on cudnn-frontend v1.6.1 which is not in XLA yet.

wenscarl

Host Offloading: Process "MoveToHost" instructions in the order they are executed.

Host Offloading: Process "MoveToHost" instructions in the order they are executed. - This ensures we process "MoveToHost" instructions that reside at the beginning of a host memory instruction offload chain....

copybara-service[bot]

Divides the solver timeout budget equally across all mesh shapes & partial mesh shapes (instead of allowing each invocation to consume the full timeout budget).

copybara-service[bot]

Allow custom call computations to contain subcomputations

copybara-service[bot]

Expose stablehlo version through the PJRT C API.

copybara-service[bot]

[XLA:MSA] Added flags to enable/disable async copy and async slice replacements in memory space assignment. Both features are enabled by default.

copybara-service[bot]

xla
xla copied to clipboard

Metadata

[XLA:GPU] Use cuGraphInstantiateWithParams to instantiate cuda graphs for better debug.

[GPU] Expose caching DebugOptions to python

Run build cleaner tooling on StableHLO

#sdy remove IdentityOp as it's no longer needed.

Support cuDNN frontend scaled dot product attention for FP8. Part- 2(backward)

Host Offloading: Process "MoveToHost" instructions in the order they are executed.

Divides the solver timeout budget equally across all mesh shapes & partial mesh shapes (instead of allowing each invocation to consume the full timeout budget).

Allow custom call computations to contain subcomputations

Expose stablehlo version through the PJRT C API.

[XLA:MSA] Added flags to enable/disable async copy and async slice replacements in memory space assignment. Both features are enabled by default.

← Metadata

Owner

Metadata

xla xla copied to clipboard

Metadata

← Metadata

Owner

Metadata

xla
xla copied to clipboard