onnxruntime Enable ROCm to use tunable GEMM

Enable ROCm to use tunable GEMM

Open cloudhan opened this issue 3 years ago • 3 comments

Related PRs #12855 #12856 #12857

Description: Enable ROCm to use tunable GEMM for better performance.

Motivation and Context

Why is this change required? What problem does it solve? This drastically improve some GEMM performance, aka, the overall performance for bert inference.

Sep 05 '22 03:09 cloudhan

For recording purpose, the perf difference with initial try

Latency(ms)     Latency_P50     Latency_P75     Latency_P90     Latency_P95     Latency_P99     Throughput(QPS) model   graph_optimization_level        intra_op_num_threads    batch_size      sequence_length test_cases      test_timesuse_gpu
113.03  113.01  113.15  113.26  113.38  113.53  9059.37 fbv_bert_fp16_rocm_no_attention_fusion.onnx     ENABLE_ALL      24      1024    128     10      10      True
94.89   94.88   94.92   94.96   94.98   95.02   10791.95        fbv_bert_fp16_rocm_no_attention_fusion.onnx     ENABLE_ALL      24      1024    128     10      10      True

Sep 05 '22 03:09 cloudhan

This PR is split into 2, the following #13116 the enabling and testing for it.

Sep 28 '22 08:09 cloudhan

onnxruntime onnxruntime copied to clipboard

Enable ROCm to use tunable GEMM

onnxruntime
onnxruntime copied to clipboard