xla
xla copied to clipboard
A machine learning compiler for GPUs, CPUs, and ML accelerators
Fix build break introduced in ffa7bb5 and df736d7
Column reduction support vectorization previously, removed after [this PR](https://github.com/openxla/xla/commit/72788f177dde61e1efbfd744435aa1985a4ff6c0). The reason of disable vectorization is that find no performance gain and the vectorization heuristic is fairly complex and different from...
Make it possible to lower fp8 `tt.splat`. Before the fix, `tt.splat` was lowered to e.g. ``` %14 = "llvm.mlir.constant"() : () -> f8E4M3FNUZ ``` which LLVM rejected. Translating the result...
T4 GPU doesn't support BF16 matmul. Because of this, XLA switches BF16 matmul to F32 matmul on T4 (IIUC). This is obviously much slower, but it turns out it's actually...
Use absl::Status instead of xla::Status now that they're identical.
Use absl::Status instead of xla::Status now that they're identical.
Use absl::Status instead of xla::Status now that they're identical.
[XLA:GPU] Remove GpuStatus.
Use absl::Status instead of xla::Status now that they're identical.