xla
xla copied to clipboard
A machine learning compiler for GPUs, CPUs, and ML accelerators
📝 Summary of Changes This change is PJRT_Triton_Extension support for ROCm as counterpart of that for CUDA. Pallas Triton calls are lowered to HSACO directly rather than PTX on ROCm...
Reverts f4d835b4b9b734953bd9eac84b8aa5358f9f6ffa
📝 Summary of Changes Properly support asan/tsan builds with rbe by providing the lists through the run_under script 🎯 Justification asan and tsan configs were missing the run_under wrapper so...
[XLA:GPU] Make flop_per_ns_per_fpu a double in CalculateEffectiveFlopsPerNs Otherwise we were undercounting effective flops. Example for H100 at full occupancy: fpu_count = n_active_core * n_active_fpus_per_core; // 132 * 128 = 16896...
PR #30855: [ROCM] CommandBuffer support for CollectivePermute op Imported from GitHub PR https://github.com/openxla/xla/pull/30855 📝 Summary of Changes - added CommandBuffer support for CollectivePermute op 🎯 Justification These ops were missing...
[XLA:GPU] Add more informative error messages to CHECKs in GpuPerformanceModel.
📝 Summary of Changes - added CommandBuffer support for CollectivePermute op 🎯 Justification These ops were missing for whatever reason: this results in graph fragmentation especially for large models. Hence...
Use `absl::StrAppend` for string concatenation. Replace `+= absl::StrCat` with `absl::StrAppend` for more efficient string appending.
📝 Summary of Changes After this change to `GpuComputeCapability` https://github.com/openxla/xla/commit/11b6a3db362f30b79d385f24523d184592132f11 `RocmExecutorTest:CreateDeviceDescription` had a build break. 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix
[XLA:GPU] use is IsTritonSupportedDataType from new support checks