benchmark
benchmark copied to clipboard
Add optional flag_gems support
Import optional Triton kernels FlagGems: https://github.com/FlagOpen/FlagGems. Support softmax and addmm operators.
Test plan:
$ python run_benchmark.py triton --op addmm --only flaggems,triton_addmm --num-inputs 2 --metrics latency,gbps,tflops
(M, N, K) flaggems-gbps flaggems-latency flaggems-tflops triton_addmm-gbps triton_addmm-latency triton_addmm-tflops
------------------ --------------- ------------------ ----------------- ------------------- ---------------------- ---------------------
(20120, 512, 1536) 220.794 0.473686 66.808 234.791 0.445449 71.043
(34579, 512, 1536) 224.696 0.79493 68.4186 231.003 0.773224 70.3393