tpp-mlir icon indicating copy to clipboard operation
tpp-mlir copied to clipboard

MHA benchmarks are taking too long

Open rengolin opened this issue 1 year ago • 5 comments

Most benchmarks we have run for seconds, but the MHA one is consistently over 6min. I'm not sure this is something in the compiler (some eager pass, or unoptimized constant folding) or the execution is really bad, but we need to solve that to be able to increase the number of iterations on weekly benchmarks consistently across the board.

@chelini

rengolin avatar Nov 23 '23 14:11 rengolin

All of them are fast enough, except fp32-mha-tensorflow-seq-len-1024.mlir. Each kernel runs for 5s on my machine, as opposed to 80ms for fp32-mha-tensorflow-seq-len-32.mlir. It's ~2% for both flops (141Gflops vs. 3.3Gflops) and time, so it's probably just a very big model that we could make smaller without breaking the objective of the test?

Maybe get sequence length = 128 would be a good compromise?

rengolin avatar Nov 23 '23 15:11 rengolin

I've disable the big one for now to run more of the rest of the benchmarks, we can re-enable it once we have a smaller version.

rengolin avatar Nov 23 '23 18:11 rengolin

See: https://github.com/plaidml/tpp-mlir/commit/8cb9ca822edde668fe687b9c73e01e9e0d0d2496

chelini avatar Dec 07 '23 09:12 chelini

With a sequence length of 64 performance are similar to 32: 22.387 gflops. Should we try 128?

chelini avatar Dec 07 '23 10:12 chelini

Try some variations and see the effects on the CI time. It should be similar to the rest. This benchmark job isn't about performance measurements, but baselines, so we don't need huge things here.

Another option is to move bigger things into the "performance" job, which runs weekly so we don't care.

rengolin avatar Dec 07 '23 10:12 rengolin