Remove rocmlir unsupported reduce types
We are going to enable f16 for reduce in this rocmlir PR: https://github.com/ROCm/rocMLIR/pull/1722
However, the code here assumes bf16 and fp8 are supported and that's not the case.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 92.29%. Comparing base (
79f95da) to head (30b200f).
Additional details and impacted files
@@ Coverage Diff @@
## develop #3766 +/- ##
========================================
Coverage 92.29% 92.29%
========================================
Files 519 519
Lines 22233 22233
========================================
Hits 20520 20520
Misses 1713 1713
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
By the way, reduce f16 should not be enabled for navi. As it does not support atomic_add with packed_f16, so it would be slow. Leaving this comment here for whoever is assigned the PR. @causten
What's the status on this PR? Is there additional work needed or is the removal of the allowed types the only thing that needs review?
What's the status on this PR? Is there additional work needed or is the removal of the allowed types the only thing that needs review?
I thought this was all that needed to be done, so I created the PR. However, we need some tests and make sure reduce f16 is not enabled for Navi (except for gfx12). I think @causten will assign somebody from migraphx team to finish this.
| Test | Batch | Rate new da147a |
Rate old 879f30 |
Diff | Compare |
|---|---|---|---|---|---|
| torchvision-resnet50 | 64 | 3,237.78 | 3,237.49 | 0.01% | :white_check_mark: |
| torchvision-resnet50_fp16 | 64 | 6,875.40 | 6,885.53 | -0.15% | :white_check_mark: |
| torchvision-densenet121 | 32 | 2,438.00 | 2,438.04 | -0.00% | :white_check_mark: |
| torchvision-densenet121_fp16 | 32 | 4,204.28 | 4,201.39 | 0.07% | :white_check_mark: |
| torchvision-inceptionv3 | 32 | 1,615.08 | 1,614.18 | 0.06% | :white_check_mark: |
| torchvision-inceptionv3_fp16 | 32 | 2,690.32 | 2,690.16 | 0.01% | :white_check_mark: |
| cadene-inceptionv4 | 16 | 750.31 | 750.10 | 0.03% | :white_check_mark: |
| cadene-resnext64x4 | 16 | 809.54 | 809.38 | 0.02% | :white_check_mark: |
| slim-mobilenet | 64 | 6,667.02 | 6,664.81 | 0.03% | :white_check_mark: |
| slim-nasnetalarge | 64 | 199.04 | 199.11 | -0.03% | :white_check_mark: |
| slim-resnet50v2 | 64 | 3,425.52 | 3,426.01 | -0.01% | :white_check_mark: |
| bert-mrpc-onnx | 8 | 1,145.52 | 1,145.27 | 0.02% | :white_check_mark: |
| bert-mrpc-tf | 1 | 476.91 | 479.30 | -0.50% | :white_check_mark: |
| pytorch-examples-wlang-gru | 1 | 480.10 | 470.08 | 2.13% | :white_check_mark: |
| pytorch-examples-wlang-lstm | 1 | 432.17 | 438.03 | -1.34% | :white_check_mark: |
| torchvision-resnet50_1 | 1 | 813.07 | 803.96 | 1.13% | :white_check_mark: |
| cadene-dpn92_1 | 1 | 429.56 | 433.48 | -0.90% | :white_check_mark: |
| cadene-resnext101_1 | 1 | 390.30 | 389.71 | 0.15% | :white_check_mark: |
| onnx-taau-downsample | 1 | 373.05 | 373.58 | -0.14% | :white_check_mark: |
| dlrm-criteoterabyte | 1 | 31.80 | 31.80 | -0.02% | :white_check_mark: |
| dlrm-criteoterabyte_fp16 | 1 | 51.05 | 51.06 | -0.01% | :white_check_mark: |
| agentmodel | 1 | 8,707.08 | 8,764.93 | -0.66% | :white_check_mark: |
| unet_fp16 | 2 | 58.00 | 58.01 | -0.02% | :white_check_mark: |
| resnet50v1_fp16 | 1 | 1,040.62 | 1,021.79 | 1.84% | :white_check_mark: |
| resnet50v1_int8 | 1 | 787.04 | 780.83 | 0.79% | :white_check_mark: |
| bert_base_cased_fp16 | 64 | 1,172.83 | 1,172.44 | 0.03% | :white_check_mark: |
| bert_large_uncased_fp16 | 32 | 362.44 | 362.45 | -0.00% | :white_check_mark: |
| bert_large_fp16 | 1 | 200.26 | 201.10 | -0.42% | :white_check_mark: |
| distilgpt2_fp16 | 16 | 2,217.99 | 2,219.84 | -0.08% | :white_check_mark: |
| yolov5s | 1 | 512.25 | 521.90 | -1.85% | :white_check_mark: |
| tinyllama | 1 | 43.63 | 43.62 | 0.04% | :white_check_mark: |
| vicuna-fastchat | 1 | 43.86 | 43.83 | 0.08% | :white_check_mark: |
| whisper-tiny-encoder | 1 | 412.29 | 412.03 | 0.06% | :white_check_mark: |
| whisper-tiny-decoder | 1 | 411.72 | 410.73 | 0.24% | :white_check_mark: |
This build is OK for merge :white_check_mark:
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output
@richagadgil we recently merged (rocmlir) support for bf16 atomic add for navi4 and gfx950. So, that means we can enable reductions for those to architectures for bf16. To sum up, Navi4 supports f32, bf16 and f16 reductions. Navi3 supports f32 reductions only. I think older Navis don't support reduction. MI series: all of them support f32 reductions (gfx906 doesn't but I don't know if migraphx supports that, as it's old). f16 reductions are supported for: gfx908, gfx90a, gfx94x, gfx950. bf16 reductions gfx950 only.
Maybe we can refactor this in the future to a centralized place or just have a rocmlir API to figure out if reduction is supported? @pfultz2
@turneram I think this is still needed?
@richagadgil can we reopen this?