tpp-mlir
tpp-mlir copied to clipboard
Graviton 3 packing not working
Tests and benchmarks all work fine, except the ones using compiler packing (both FP32 and BF16).
Benchmark: prepacked_targets
gemm_fp32_dnn_target : 79.273 gflops
gemm_bf16_dnn_target : 256.180 gflops
mlp_fp32_dnn_target : 78.956 gflops
mlp_bf16_dnn_target : 254.930 gflops
gemm_fp32_mlir : 78.429 gflops
gemm_bf16_dp4_mlir : 253.889 gflops
mlp_fp32_mlir : 78.576 gflops
mlp_bf16_dp4_mlir : 250.948 gflops
Benchmark: gemm_models
fp32_3x1024_const_mlir : 0.050 gflops
fp32_3x1024_args_mlir : 0.002 gflops
bf16_3x1024_const_mlir : 0.050 gflops
bf16_3x1024_args_mlir : 0.002 gflops
Benchmark: mlp_models
fp32_3x1024_const_mlir : 0.050 gflops
fp32_3x1024_args_mlir : 0.002 gflops
bf16_3x1024_const_mlir : 0.050 gflops
bf16_3x1024_args_mlir : 0.002 gflops
Benchmark: torch_dynamo
gemm_fp32_torch : 0.050 gflops
gemm_bf16_torch : 0.050 gflops
mlp_fp32_torch : 0.050 gflops
mlp_bf16_torch : 0.050 gflops
This used to work circa early Jan, so it's something new. I won't have time to bisect until CGO, so I'll leave this here and just not report packing on Arm.
We need a Graviton builder, at least once a day. But we also need a benchmark that fails on certain conditions, which we don't have either. 😭