tpp-mlir icon indicating copy to clipboard operation
tpp-mlir copied to clipboard

TPP experimentation on MLIR for linear algebra

Results 37 tpp-mlir issues
Sort by recently updated
recently updated
newest added

Need to profile what's going on here. 99% of the time is spent on libxsmm calls, so why the large variation and why the compiler is "faster" on Zen and...

The Intel AMX TileConfig Hoisting uses today the AllocaOp for the tileconfig state as anchor op and then attempts to move IntelAMXTileConfigDispatchOp around based on a test if the AllocOp...

Currently, we're selecting our optimal blocking on the command line, with default `{2,8}` that is optimal for 16 threads. On our benchmarks, we pick the best one for each number...

Tests and benchmarks all work fine, except the ones using compiler packing (both FP32 and BF16). ``` Benchmark: prepacked_targets gemm_fp32_dnn_target : 79.273 gflops gemm_bf16_dnn_target : 256.180 gflops mlp_fp32_dnn_target : 78.956...

As noted here: https://github.com/libxsmm/libxsmm-dnn/issues/29#issuecomment-1871502920

Tried running `torch-mlir` exported ResNet in `linalg-on-tensor` via `tpp-run` and found a crash. `tpp-opt` works fine though. *Commands* (Install `torch-mlir` using `pip`) ``` $ python examples/torchscript_resnet18_all_output_types.py $ tpp-opt rn18.mlir -o...

`gpu.subgroup_mma_elementwise maxf` lowering to `nvvm` dialect creates invalid constant: `%0 = llvm.mlir.constant(0x7E00 : f16) : vector` which prevents GPU binary generation with: `error: FloatAttr does not match expected type of...

Today, we make wrong packing decisions based on types (ex. bf16 always means vnni) instead of target support. We also make a [compile-time decision](https://github.com/plaidml/tpp-mlir/blob/main/lib/TPP/VNNIUtils.cpp#L24) about the packing shapes, which is...

Most benchmarks we have run for seconds, but the MHA one is consistently over 6min. I'm not sure this is something in the compiler (some eager pass, or unoptimized constant...