AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Remove rocmlir unsupported reduce types

Open dhernandez0 opened this issue 11 months ago • 9 comments

We are going to enable f16 for reduce in this rocmlir PR: https://github.com/ROCm/rocMLIR/pull/1722

However, the code here assumes bf16 and fp8 are supported and that's not the case.

dhernandez0 avatar Jan 17 '25 14:01 dhernandez0

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.29%. Comparing base (79f95da) to head (30b200f).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3766   +/-   ##
========================================
  Coverage    92.29%   92.29%           
========================================
  Files          519      519           
  Lines        22233    22233           
========================================
  Hits         20520    20520           
  Misses        1713     1713           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jan 17 '25 16:01 codecov[bot]

By the way, reduce f16 should not be enabled for navi. As it does not support atomic_add with packed_f16, so it would be slow. Leaving this comment here for whoever is assigned the PR. @causten

dhernandez0 avatar Jan 20 '25 15:01 dhernandez0

What's the status on this PR? Is there additional work needed or is the removal of the allowed types the only thing that needs review?

CharlieL7 avatar Jan 21 '25 17:01 CharlieL7

What's the status on this PR? Is there additional work needed or is the removal of the allowed types the only thing that needs review?

I thought this was all that needed to be done, so I created the PR. However, we need some tests and make sure reduce f16 is not enabled for Navi (except for gfx12). I think @causten will assign somebody from migraphx team to finish this.

dhernandez0 avatar Jan 21 '25 19:01 dhernandez0

Test Batch Rate new
da147a
Rate old
879f30
Diff Compare
torchvision-resnet50 64 3,237.78 3,237.49 0.01% :white_check_mark:
torchvision-resnet50_fp16 64 6,875.40 6,885.53 -0.15% :white_check_mark:
torchvision-densenet121 32 2,438.00 2,438.04 -0.00% :white_check_mark:
torchvision-densenet121_fp16 32 4,204.28 4,201.39 0.07% :white_check_mark:
torchvision-inceptionv3 32 1,615.08 1,614.18 0.06% :white_check_mark:
torchvision-inceptionv3_fp16 32 2,690.32 2,690.16 0.01% :white_check_mark:
cadene-inceptionv4 16 750.31 750.10 0.03% :white_check_mark:
cadene-resnext64x4 16 809.54 809.38 0.02% :white_check_mark:
slim-mobilenet 64 6,667.02 6,664.81 0.03% :white_check_mark:
slim-nasnetalarge 64 199.04 199.11 -0.03% :white_check_mark:
slim-resnet50v2 64 3,425.52 3,426.01 -0.01% :white_check_mark:
bert-mrpc-onnx 8 1,145.52 1,145.27 0.02% :white_check_mark:
bert-mrpc-tf 1 476.91 479.30 -0.50% :white_check_mark:
pytorch-examples-wlang-gru 1 480.10 470.08 2.13% :white_check_mark:
pytorch-examples-wlang-lstm 1 432.17 438.03 -1.34% :white_check_mark:
torchvision-resnet50_1 1 813.07 803.96 1.13% :white_check_mark:
cadene-dpn92_1 1 429.56 433.48 -0.90% :white_check_mark:
cadene-resnext101_1 1 390.30 389.71 0.15% :white_check_mark:
onnx-taau-downsample 1 373.05 373.58 -0.14% :white_check_mark:
dlrm-criteoterabyte 1 31.80 31.80 -0.02% :white_check_mark:
dlrm-criteoterabyte_fp16 1 51.05 51.06 -0.01% :white_check_mark:
agentmodel 1 8,707.08 8,764.93 -0.66% :white_check_mark:
unet_fp16 2 58.00 58.01 -0.02% :white_check_mark:
resnet50v1_fp16 1 1,040.62 1,021.79 1.84% :white_check_mark:
resnet50v1_int8 1 787.04 780.83 0.79% :white_check_mark:
bert_base_cased_fp16 64 1,172.83 1,172.44 0.03% :white_check_mark:
bert_large_uncased_fp16 32 362.44 362.45 -0.00% :white_check_mark:
bert_large_fp16 1 200.26 201.10 -0.42% :white_check_mark:
distilgpt2_fp16 16 2,217.99 2,219.84 -0.08% :white_check_mark:
yolov5s 1 512.25 521.90 -1.85% :white_check_mark:
tinyllama 1 43.63 43.62 0.04% :white_check_mark:
vicuna-fastchat 1 43.86 43.83 0.08% :white_check_mark:
whisper-tiny-encoder 1 412.29 412.03 0.06% :white_check_mark:
whisper-tiny-decoder 1 411.72 410.73 0.24% :white_check_mark:

This build is OK for merge :white_check_mark:

migraphx-bot avatar Jan 30 '25 07:01 migraphx-bot


     :white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert-mrpc-tf: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance
     :white_check_mark: torchvision-resnet50_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-dpn92_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-resnext101_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance
     :white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance
     :white_check_mark: unet: PASSED: MIGraphX meets tolerance
     :white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

     :white_check_mark: bert_large: PASSED: MIGraphX meets tolerance
     :white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance
     :white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance
     :white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

migraphx-bot avatar Jan 30 '25 07:01 migraphx-bot

@richagadgil we recently merged (rocmlir) support for bf16 atomic add for navi4 and gfx950. So, that means we can enable reductions for those to architectures for bf16. To sum up, Navi4 supports f32, bf16 and f16 reductions. Navi3 supports f32 reductions only. I think older Navis don't support reduction. MI series: all of them support f32 reductions (gfx906 doesn't but I don't know if migraphx supports that, as it's old). f16 reductions are supported for: gfx908, gfx90a, gfx94x, gfx950. bf16 reductions gfx950 only.

Maybe we can refactor this in the future to a centralized place or just have a rocmlir API to figure out if reduction is supported? @pfultz2

dhernandez0 avatar Feb 28 '25 08:02 dhernandez0

@turneram I think this is still needed?

dhernandez0 avatar Mar 07 '25 15:03 dhernandez0

@richagadgil can we reopen this?

dhernandez0 avatar Mar 07 '25 16:03 dhernandez0