onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

Enable ROCm to use tunable GEMM

Open cloudhan opened this issue 3 years ago • 3 comments

Related PRs #12855 #12856 #12857

Description: Enable ROCm to use tunable GEMM for better performance.

Motivation and Context

  • Why is this change required? What problem does it solve? This drastically improve some GEMM performance, aka, the overall performance for bert inference.

cloudhan avatar Sep 05 '22 03:09 cloudhan

For recording purpose, the perf difference with initial try

Latency(ms)     Latency_P50     Latency_P75     Latency_P90     Latency_P95     Latency_P99     Throughput(QPS) model   graph_optimization_level        intra_op_num_threads    batch_size      sequence_length test_cases      test_timesuse_gpu
113.03  113.01  113.15  113.26  113.38  113.53  9059.37 fbv_bert_fp16_rocm_no_attention_fusion.onnx     ENABLE_ALL      24      1024    128     10      10      True
94.89   94.88   94.92   94.96   94.98   95.02   10791.95        fbv_bert_fp16_rocm_no_attention_fusion.onnx     ENABLE_ALL      24      1024    128     10      10      True

cloudhan avatar Sep 05 '22 03:09 cloudhan

This PR is split into 2, the following #13116 the enabling and testing for it.

cloudhan avatar Sep 28 '22 08:09 cloudhan