AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Integrate hipblaslt calls in gemm op

Open mqhc2020 opened this issue 1 year ago • 4 comments

mqhc2020 avatar Jan 22 '24 08:01 mqhc2020

Codecov Report

Attention: Patch coverage is 91.44444% with 77 lines in your changes are missing coverage. Please review.

Project coverage is 91.82%. Comparing base (f0cb545) to head (f802ed1). Report is 100 commits behind head on develop.

:exclamation: Current head f802ed1 differs from pull request most recent head 1913b56

Please upload reports for the commit 1913b56 to get more accurate results.

Files Patch % Lines
src/module.cpp 82.11% 27 Missing :warning:
src/argument.cpp 0.00% 7 Missing :warning:
src/shape.cpp 0.00% 7 Missing :warning:
src/cpp_generator.cpp 40.00% 6 Missing :warning:
src/fuse_pointwise_reduce.cpp 0.00% 6 Missing :warning:
src/split_reduce.cpp 94.04% 5 Missing :warning:
src/onnx/parse_matmul.cpp 96.05% 3 Missing :warning:
src/program.cpp 70.00% 3 Missing :warning:
src/fuse_reduce.cpp 95.65% 2 Missing :warning:
src/onnx/parse_convolution.cpp 97.75% 2 Missing :warning:
... and 7 more
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2671      +/-   ##
===========================================
- Coverage    91.83%   91.82%   -0.01%     
===========================================
  Files          479      486       +7     
  Lines        18340    18991     +651     
===========================================
+ Hits         16842    17438     +596     
- Misses        1498     1553      +55     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jan 22 '24 08:01 codecov[bot]

This should be a different operator. We can add a compile_blas similar to compile_miopen that can do the tuning for convolution through the compile function in operation interface. We just insert a placeholder operator during lowering, and then compile_blas will run the compile function to tune which operator is the fastest. Then this function can return the time and needed workspace.

The workspace should be inserted as an allocate instruction so the memory can be reused by non-hipblas kernels. So by returning the workspace we can insert the needed allocate instruction and then pass it as a parameter to the hipblas operator.

pfultz2 avatar Feb 17 '24 00:02 pfultz2

Test Batch Rate new
f802ed
Rate old
bceef1
Diff Compare
torchvision-resnet50 64 1,713.86 1,712.95 0.05% :white_check_mark:
torchvision-resnet50_fp16 64 3,811.66 3,811.61 0.00% :white_check_mark:
torchvision-densenet121 32 1,454.57 1,458.96 -0.30% :white_check_mark:
torchvision-densenet121_fp16 32 2,438.67 2,432.65 0.25% :white_check_mark:
torchvision-inceptionv3 32 883.89 883.89 0.00% :white_check_mark:
torchvision-inceptionv3_fp16 32 623.23 1,381.08 -54.87% :red_circle:
cadene-inceptionv4 16 400.41 408.06 -1.87% :white_check_mark:
cadene-resnext64x4 16 413.56 413.67 -0.03% :white_check_mark:
slim-mobilenet 64 3,172.10 3,824.46 -17.06% :red_circle:
slim-nasnetalarge 64 91.08 97.02 -6.12% :red_circle:
slim-resnet50v2 64 998.48 1,651.41 -39.54% :red_circle:
bert-mrpc-onnx 8 570.25 589.10 -3.20% :red_circle:
bert-mrpc-tf 1 286.89 288.10 -0.42% :white_check_mark:
pytorch-examples-wlang-gru 1 333.47 300.11 11.12% :high_brightness:
pytorch-examples-wlang-lstm 1 267.31 265.42 0.71% :white_check_mark:
torchvision-resnet50_1 1 455.44 437.05 4.21% :high_brightness:
cadene-dpn92_1 1 244.22 243.98 0.10% :white_check_mark:
cadene-resnext101_1 1 187.20 189.16 -1.03% :white_check_mark:
onnx-taau-downsample 1 190.07 204.02 -6.84% :red_circle:
dlrm-criteoterabyte 1 10.64 22.24 -52.16% :red_circle:
dlrm-criteoterabyte_fp16 1 38.45 41.43 -7.20% :red_circle:
agentmodel 1 5,871.30 6,319.63 -7.09% :red_circle:
unet_fp16 2 33.59 33.77 -0.52% :white_check_mark:
resnet50v1_fp16 1 527.96 549.27 -3.88% :red_circle:
resnet50v1_int8 1 458.55 456.03 0.55% :white_check_mark:
bert_base_cased_fp16 64 617.62 620.67 -0.49% :white_check_mark:
bert_large_uncased_fp16 32 193.87 193.76 0.06% :white_check_mark:
bert_large_fp16 1 103.80 103.76 0.04% :white_check_mark:
distilgpt2_fp16 16 1,188.12 1,186.96 0.10% :white_check_mark:
yolov5s 1 297.82 298.20 -0.13% :white_check_mark:
tinyllama 1 23.32 23.32 0.02% :white_check_mark:
vicuna-fastchat 1 133.31 133.50 -0.15% :white_check_mark:
whisper-tiny-encoder 1 241.09 240.87 0.09% :white_check_mark:
whisper-tiny-decoder 1 89.16 245.38 -63.67% :red_circle:

This build is not recommended to merge :red_circle:

migraphx-bot avatar Mar 06 '24 22:03 migraphx-bot


     :white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert-mrpc-tf: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance
     :white_check_mark: torchvision-resnet50_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-dpn92_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-resnext101_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance
     :white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance
     :white_check_mark: unet: PASSED: MIGraphX meets tolerance
     :white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

     :white_check_mark: bert_large: PASSED: MIGraphX meets tolerance
     :white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance
     :white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance
     :white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

migraphx-bot avatar Mar 06 '24 22:03 migraphx-bot