xla icon indicating copy to clipboard operation
xla copied to clipboard

A machine learning compiler for GPUs, CPUs, and ML accelerators

Results 653 xla issues
Sort by recently updated
recently updated
newest added

📝 Summary of Changes This change is PJRT_Triton_Extension support for ROCm as counterpart of that for CUDA. Pallas Triton calls are lowered to HSACO directly rather than PTX on ROCm...

📝 Summary of Changes Properly support asan/tsan builds with rbe by providing the lists through the run_under script 🎯 Justification asan and tsan configs were missing the run_under wrapper so...

[XLA:GPU] Make flop_per_ns_per_fpu a double in CalculateEffectiveFlopsPerNs Otherwise we were undercounting effective flops. Example for H100 at full occupancy: fpu_count = n_active_core * n_active_fpus_per_core; // 132 * 128 = 16896...

PR #30855: [ROCM] CommandBuffer support for CollectivePermute op Imported from GitHub PR https://github.com/openxla/xla/pull/30855 📝 Summary of Changes - added CommandBuffer support for CollectivePermute op 🎯 Justification These ops were missing...

[XLA:GPU] Add more informative error messages to CHECKs in GpuPerformanceModel.

📝 Summary of Changes - added CommandBuffer support for CollectivePermute op 🎯 Justification These ops were missing for whatever reason: this results in graph fragmentation especially for large models. Hence...

Use `absl::StrAppend` for string concatenation. Replace `+= absl::StrCat` with `absl::StrAppend` for more efficient string appending.

📝 Summary of Changes After this change to `GpuComputeCapability` https://github.com/openxla/xla/commit/11b6a3db362f30b79d385f24523d184592132f11 `RocmExecutorTest:CreateDeviceDescription` had a build break. 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix

[XLA:GPU] use is IsTritonSupportedDataType from new support checks