xla icon indicating copy to clipboard operation
xla copied to clipboard

A machine learning compiler for GPUs, CPUs, and ML accelerators

Results 653 xla issues
Sort by recently updated
recently updated
newest added

This PR uses cuGraphInstantiateWithParms instead of cuGraphInstantiate to instantiate cuda graph executors, so current command_buffer_cmd_test and command_buffer_thunk_test should cover the changes in this PR.

Adds python bindings for `xla_gpu_kernel_cache_file`, `xla_gpu_enable_llvm_module_compilation_parallelism` and `xla_gpu_per_fusion_autotune_cache_dir`. We would like to add some convenience features to JAX which will enable all caches with one flag/option (will open PR for...

As the 2nd part of #15092. NOTE: this feature relies on cudnn-frontend v1.6.1 which is not in XLA yet.

Host Offloading: Process "MoveToHost" instructions in the order they are executed. - This ensures we process "MoveToHost" instructions that reside at the beginning of a host memory instruction offload chain....

Divides the solver timeout budget equally across all mesh shapes & partial mesh shapes (instead of allowing each invocation to consume the full timeout budget).

Allow custom call computations to contain subcomputations

[XLA:MSA] Added flags to enable/disable async copy and async slice replacements in memory space assignment. Both features are enabled by default.