tpp-mlir issues

WIP upstream pack matmul

1

Performance variation in single thread benchmark exeuction

3

Need to profile what's going on here. 99% of the time is spent on libxsmm calls, so why the large variation and why the compiler is "faster" on Zen and...

rengolin

IntelAMX TileConfig Hoisting should be anchored on IntelAMXTileConfigDispatchOp

The Intel AMX TileConfig Hoisting uses today the AllocaOp for the tileconfig state as anchor op and then attempts to move IntelAMXTileConfigDispatchOp around based on a test if the AllocOp...

alheinecke

Make 2D parallelization a run time choice

Currently, we're selecting our optimal blocking on the command line, with default `{2,8}` that is optimal for 16 threads. On our benchmarks, we pick the best one for each number...

rengolin

Graviton 3 packing not working

Tests and benchmarks all work fine, except the ones using compiler packing (both FP32 and BF16). ``` Benchmark: prepacked_targets gemm_fp32_dnn_target : 79.273 gflops gemm_bf16_dnn_target : 256.180 gflops mlp_fp32_dnn_target : 78.956...

rengolin

Update libxsmm-dnn with new argument for VNNI^T

1

As noted here: https://github.com/libxsmm/libxsmm-dnn/issues/29#issuecomment-1871502920

rengolin

tpp-run does not support ml_program dialect

6

Tried running `torch-mlir` exported ResNet in `linalg-on-tensor` via `tpp-run` and found a crash. `tpp-opt` works fine though. *Commands* (Install `torch-mlir` using `pip`) ``` $ python examples/torchscript_resnet18_all_output_types.py $ tpp-opt rn18.mlir -o...

nhasabni

Invalid constant in `WmmaOpsToNvvm` lowering

`gpu.subgroup_mma_elementwise maxf` lowering to `nvvm` dialect creates invalid constant: `%0 = llvm.mlir.constant(0x7E00 : f16) : vector` which prevents GPU binary generation with: `error: FloatAttr does not match expected type of...

adam-smnk

Create a "target description" class for target-specific decisions

4

Today, we make wrong packing decisions based on types (ex. bf16 always means vnni) instead of target support. We also make a [compile-time decision](https://github.com/plaidml/tpp-mlir/blob/main/lib/TPP/VNNIUtils.cpp#L24) about the packing shapes, which is...

rengolin

MHA benchmarks are taking too long

5

Most benchmarks we have run for seconds, but the MHA one is consistently over 6min. I'm not sure this is something in the compiler (some eager pass, or unoptimized constant...

rengolin

tpp-mlir
tpp-mlir copied to clipboard

Metadata

WIP upstream pack matmul

Performance variation in single thread benchmark exeuction

IntelAMX TileConfig Hoisting should be anchored on IntelAMXTileConfigDispatchOp

Make 2D parallelization a run time choice

Graviton 3 packing not working

Update libxsmm-dnn with new argument for VNNI^T

tpp-run does not support ml_program dialect

Invalid constant in `WmmaOpsToNvvm` lowering

Create a "target description" class for target-specific decisions

MHA benchmarks are taking too long

← Metadata

Owner

Metadata

tpp-mlir tpp-mlir copied to clipboard

Metadata

← Metadata

Owner

Metadata

tpp-mlir
tpp-mlir copied to clipboard