xla
xla copied to clipboard
A machine learning compiler for GPUs, CPUs, and ML accelerators
Integrate StableHLO at openxla/stablehlo@96acdcb7
Update autotuner to filter out "Cublas_fission" backends. - This fixes the xla_gpu_cublas_fallback flag behavior.
Refactor: Use `std::align_val_t` for aligned allocation functions. This change updates `AlignedMalloc`, `AlignedSizedFree`, and `AlignedAllocator` to use `std::align_val_t` for specifying alignment, aligning with standard C++ practices for overaligned allocation. Deprecated inline...
[XLA:CPU] Use new generic Eigen intrinsics.
Migrate memory_space_assignment_test_base to PjRt.
Include compilation environment and debug options in split comp. fingerprints.
[XLA:CollectivePipeliner] Fix two issues: 1) Accept transpose as a formatting op in ForwardSink. 2) Do not stop when a large collective was sunk in the previous iteration. Instead, delay sinking...
[XLA:CPU] TargetMachine contains all target information 1. Makes sure features detected in PjRT are used by the CPU compiler 2. Ensures target machine is initialized with requested features This is...
[XLA] Rename TargetConfig to GpuTargetConfig and add CpuTargetConfig to CompilerOptions We need a way to pass target information to the cpu compiler, and TargetConfig seems to fit that purpose.
📝 Summary of Changes All-gathers can only run on the major-most physical dimension - concatenating buffers from ranks. When an all-gather on a logical dimension index > 0 is requested,...