AMDMIGraphX
AMDMIGraphX copied to clipboard
Integrate hipblaslt calls in gemm op
Codecov Report
Attention: Patch coverage is 91.44444% with 77 lines in your changes are missing coverage. Please review.
Project coverage is 91.82%. Comparing base (
f0cb545) to head (f802ed1). Report is 100 commits behind head on develop.
:exclamation: Current head f802ed1 differs from pull request most recent head 1913b56
Please upload reports for the commit 1913b56 to get more accurate results.
| Files | Patch % | Lines |
|---|---|---|
| src/module.cpp | 82.11% | 27 Missing :warning: |
| src/argument.cpp | 0.00% | 7 Missing :warning: |
| src/shape.cpp | 0.00% | 7 Missing :warning: |
| src/cpp_generator.cpp | 40.00% | 6 Missing :warning: |
| src/fuse_pointwise_reduce.cpp | 0.00% | 6 Missing :warning: |
| src/split_reduce.cpp | 94.04% | 5 Missing :warning: |
| src/onnx/parse_matmul.cpp | 96.05% | 3 Missing :warning: |
| src/program.cpp | 70.00% | 3 Missing :warning: |
| src/fuse_reduce.cpp | 95.65% | 2 Missing :warning: |
| src/onnx/parse_convolution.cpp | 97.75% | 2 Missing :warning: |
| ... and 7 more |
Additional details and impacted files
@@ Coverage Diff @@
## develop #2671 +/- ##
===========================================
- Coverage 91.83% 91.82% -0.01%
===========================================
Files 479 486 +7
Lines 18340 18991 +651
===========================================
+ Hits 16842 17438 +596
- Misses 1498 1553 +55
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This should be a different operator. We can add a compile_blas similar to compile_miopen that can do the tuning for convolution through the compile function in operation interface. We just insert a placeholder operator during lowering, and then compile_blas will run the compile function to tune which operator is the fastest. Then this function can return the time and needed workspace.
The workspace should be inserted as an allocate instruction so the memory can be reused by non-hipblas kernels. So by returning the workspace we can insert the needed allocate instruction and then pass it as a parameter to the hipblas operator.
| Test | Batch | Rate new f802ed |
Rate old bceef1 |
Diff | Compare |
|---|---|---|---|---|---|
| torchvision-resnet50 | 64 | 1,713.86 | 1,712.95 | 0.05% | :white_check_mark: |
| torchvision-resnet50_fp16 | 64 | 3,811.66 | 3,811.61 | 0.00% | :white_check_mark: |
| torchvision-densenet121 | 32 | 1,454.57 | 1,458.96 | -0.30% | :white_check_mark: |
| torchvision-densenet121_fp16 | 32 | 2,438.67 | 2,432.65 | 0.25% | :white_check_mark: |
| torchvision-inceptionv3 | 32 | 883.89 | 883.89 | 0.00% | :white_check_mark: |
| torchvision-inceptionv3_fp16 | 32 | 623.23 | 1,381.08 | -54.87% | :red_circle: |
| cadene-inceptionv4 | 16 | 400.41 | 408.06 | -1.87% | :white_check_mark: |
| cadene-resnext64x4 | 16 | 413.56 | 413.67 | -0.03% | :white_check_mark: |
| slim-mobilenet | 64 | 3,172.10 | 3,824.46 | -17.06% | :red_circle: |
| slim-nasnetalarge | 64 | 91.08 | 97.02 | -6.12% | :red_circle: |
| slim-resnet50v2 | 64 | 998.48 | 1,651.41 | -39.54% | :red_circle: |
| bert-mrpc-onnx | 8 | 570.25 | 589.10 | -3.20% | :red_circle: |
| bert-mrpc-tf | 1 | 286.89 | 288.10 | -0.42% | :white_check_mark: |
| pytorch-examples-wlang-gru | 1 | 333.47 | 300.11 | 11.12% | :high_brightness: |
| pytorch-examples-wlang-lstm | 1 | 267.31 | 265.42 | 0.71% | :white_check_mark: |
| torchvision-resnet50_1 | 1 | 455.44 | 437.05 | 4.21% | :high_brightness: |
| cadene-dpn92_1 | 1 | 244.22 | 243.98 | 0.10% | :white_check_mark: |
| cadene-resnext101_1 | 1 | 187.20 | 189.16 | -1.03% | :white_check_mark: |
| onnx-taau-downsample | 1 | 190.07 | 204.02 | -6.84% | :red_circle: |
| dlrm-criteoterabyte | 1 | 10.64 | 22.24 | -52.16% | :red_circle: |
| dlrm-criteoterabyte_fp16 | 1 | 38.45 | 41.43 | -7.20% | :red_circle: |
| agentmodel | 1 | 5,871.30 | 6,319.63 | -7.09% | :red_circle: |
| unet_fp16 | 2 | 33.59 | 33.77 | -0.52% | :white_check_mark: |
| resnet50v1_fp16 | 1 | 527.96 | 549.27 | -3.88% | :red_circle: |
| resnet50v1_int8 | 1 | 458.55 | 456.03 | 0.55% | :white_check_mark: |
| bert_base_cased_fp16 | 64 | 617.62 | 620.67 | -0.49% | :white_check_mark: |
| bert_large_uncased_fp16 | 32 | 193.87 | 193.76 | 0.06% | :white_check_mark: |
| bert_large_fp16 | 1 | 103.80 | 103.76 | 0.04% | :white_check_mark: |
| distilgpt2_fp16 | 16 | 1,188.12 | 1,186.96 | 0.10% | :white_check_mark: |
| yolov5s | 1 | 297.82 | 298.20 | -0.13% | :white_check_mark: |
| tinyllama | 1 | 23.32 | 23.32 | 0.02% | :white_check_mark: |
| vicuna-fastchat | 1 | 133.31 | 133.50 | -0.15% | :white_check_mark: |
| whisper-tiny-encoder | 1 | 241.09 | 240.87 | 0.09% | :white_check_mark: |
| whisper-tiny-decoder | 1 | 89.16 | 245.38 | -63.67% | :red_circle: |
This build is not recommended to merge :red_circle:
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output