xla issues

Integrate StableHLO at openxla/stablehlo@96acdcb7

Update autotuner to filter out "Cublas_fission" backends.

Update autotuner to filter out "Cublas_fission" backends. - This fixes the xla_gpu_cublas_fallback flag behavior.

Refactor: Use `std::align_val_t` for aligned allocation functions.

Refactor: Use `std::align_val_t` for aligned allocation functions. This change updates `AlignedMalloc`, `AlignedSizedFree`, and `AlignedAllocator` to use `std::align_val_t` for specifying alignment, aligning with standard C++ practices for overaligned allocation. Deprecated inline...

copybara-service[bot]

[XLA:CollectivePipeliner] Fix two issues:

[XLA:CollectivePipeliner] Fix two issues: 1) Accept transpose as a formatting op in ForwardSink. 2) Do not stop when a large collective was sunk in the previous iteration. Instead, delay sinking...

copybara-service[bot]

[XLA:CPU] TargetMachine contains all target information

[XLA:CPU] TargetMachine contains all target information 1. Makes sure features detected in PjRT are used by the CPU compiler 2. Ensures target machine is initialized with requested features This is...

copybara-service[bot]

[XLA] Rename TargetConfig to GpuTargetConfig and add CpuTargetConfig to CompilerOptions

[XLA] Rename TargetConfig to GpuTargetConfig and add CpuTargetConfig to CompilerOptions We need a way to pass target information to the cpu compiler, and TargetConfig seems to fit that purpose.

copybara-service[bot]

[GPU] Optimize all-gathers on non-major dimension using a single transpose.

3

📝 Summary of Changes All-gathers can only run on the major-most physical dimension - concatenating buffers from ranks. When an all-gather on a logical dimension index > 0 is requested,...

sergachev

xla
xla copied to clipboard

Metadata

Integrate StableHLO at openxla/stablehlo@96acdcb7

Update autotuner to filter out "Cublas_fission" backends.

Refactor: Use `std::align_val_t` for aligned allocation functions.

[XLA:CPU] Use new generic Eigen intrinsics.

Migrate memory_space_assignment_test_base to PjRt.

Include compilation environment and debug options in split comp. fingerprints.

[XLA:CollectivePipeliner] Fix two issues:

[XLA:CPU] TargetMachine contains all target information

[XLA] Rename TargetConfig to GpuTargetConfig and add CpuTargetConfig to CompilerOptions

[GPU] Optimize all-gathers on non-major dimension using a single transpose.

← Metadata

Owner

Metadata

xla xla copied to clipboard

Metadata

← Metadata

Owner

Metadata

xla
xla copied to clipboard