AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

adjust stride ordering rules for standard shape: stride can be anythi…

Open bpickrel opened this issue 1 year ago • 9 comments

…ng in a dimension of size 1. This is for issue https://github.com/ROCm/AMDMIGraphX/issues/3117

bpickrel avatar Jul 17 '24 22:07 bpickrel

Codecov Report

Attention: Patch coverage is 83.33333% with 3 lines in your changes missing coverage. Please review.

Project coverage is 92.14%. Comparing base (3843a96) to head (57b37f6). Report is 178 commits behind head on develop.

Files with missing lines Patch % Lines
src/program.cpp 75.00% 3 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3285      +/-   ##
===========================================
- Coverage    92.16%   92.14%   -0.02%     
===========================================
  Files          504      504              
  Lines        20486    20523      +37     
===========================================
+ Hits         18880    18910      +30     
- Misses        1606     1613       +7     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jul 18 '24 15:07 codecov[bot]

Confirmed it also works on MI300 for good measure, although this should have been GPU-independent.

bpickrel avatar Jul 19 '24 19:07 bpickrel

Added @pfultz2 Paul, the question is whether it ever makes sense to define a shape as both standard and broadcast, if the 0-length dimension is on an axis we're ignoring for the standard check. The only test case we know of that has both standard and broadcast properties is a scalar.

bpickrel avatar Jul 22 '24 23:07 bpickrel

the question is whether it ever makes sense to define a shape as both standard and broadcast

Yes it does make sense. "Standard" shape basically means that the offset and index are the same.

pfultz2 avatar Jul 23 '24 15:07 pfultz2

@bpickrel CI Failure...

[2024-07-25T00:26:40.785Z] [   RUN    ] test_shape_alloc
[2024-07-25T00:26:40.785Z] test_verify: /home/jenkins/workspace/AMDMIGraphX_PR-3285/src/program.cpp:511: std::vector<argument> migraphx::generic_eval(const module *, std::vector<context> &, std::unordered_map<std::string, argument>, std::unordered_map<instruction_ref, argument>, F) [F = (lambda at /home/jenkins/workspace/AMDMIGraphX_PR-3285/src/program.cpp:530:57)]: Assertion `ins->get_shape().any_of_dynamic() or results.at(ins).get_shape() == ins->get_shape()' failed.

[2024-07-25T00:26:40.785Z] CMake Error at gdb/test_test_verify_conv/run.cmake:16 (message):
[2024-07-25T00:26:40.785Z]   Test failed

causten avatar Jul 26 '24 15:07 causten

Not an obvious fix to test failure--back to draft status.

bpickrel avatar Jul 26 '24 23:07 bpickrel

Don't re-review until I get current build fails fixed.

bpickrel avatar Jul 31 '24 19:07 bpickrel

Now seeing a test fail in this code:

    /*!
     * Check all shapes have the same layout.
     */
    const check_shapes& same_layout() const
    {
        if(not this->same([](const shape& s) { return find_permutation(s); }))
            MIGRAPHX_THROW(prefix() + "Layouts do not match");
        return *this;
    }

bpickrel avatar Aug 06 '24 23:08 bpickrel

Took some time but replicated the fail on rocm-rome-6. Examining the output now. Output file: rocblas_fail.txt

bpickrel avatar Sep 09 '24 22:09 bpickrel

Took some time but replicated the fail on rocm-rome-6. Examining the output now. Output file:

So the failure is in fuse_ops, which means we have a bug somewhere where it is changing the shape.

A backtrace will help find out which matcher this is happening.

pfultz2 avatar Sep 09 '24 22:09 pfultz2

Took some time but replicated the fail on rocm-rome-6. Examining the output now. Output file:

So the failure is in fuse_ops, which means we have a bug somewhere where it is changing the shape.

A backtrace will help find out which matcher this is happening.

I think I found it; just pushed for testing

bpickrel avatar Sep 09 '24 23:09 bpickrel

Test Batch Rate new
57b37f
Rate old
b6b228
Diff Compare
torchvision-resnet50 64 3,245.27 3,245.65 -0.01% :white_check_mark:
torchvision-resnet50_fp16 64 6,983.90 6,977.99 0.08% :white_check_mark:
torchvision-densenet121 32 2,429.29 2,431.17 -0.08% :white_check_mark:
torchvision-densenet121_fp16 32 4,076.21 4,084.44 -0.20% :white_check_mark:
torchvision-inceptionv3 32 1,633.66 1,632.74 0.06% :white_check_mark:
torchvision-inceptionv3_fp16 32 2,734.93 2,736.68 -0.06% :white_check_mark:
cadene-inceptionv4 16 775.84 775.89 -0.01% :white_check_mark:
cadene-resnext64x4 16 807.90 807.90 0.00% :white_check_mark:
slim-mobilenet 64 7,449.91 7,451.79 -0.03% :white_check_mark:
slim-nasnetalarge 64 207.54 207.50 0.02% :white_check_mark:
slim-resnet50v2 64 3,340.07 3,339.16 0.03% :white_check_mark:
bert-mrpc-onnx 8 1,152.85 1,149.07 0.33% :white_check_mark:
bert-mrpc-tf 1 305.81 315.13 -2.96% :white_check_mark:
pytorch-examples-wlang-gru 1 426.40 407.88 4.54% :high_brightness:
pytorch-examples-wlang-lstm 1 385.33 494.80 -22.13% :red_circle:
torchvision-resnet50_1 1 793.00 804.44 -1.42% :white_check_mark:
cadene-dpn92_1 1 398.86 403.57 -1.17% :white_check_mark:
cadene-resnext101_1 1 381.38 382.47 -0.28% :white_check_mark:
onnx-taau-downsample 1 345.79 344.14 0.48% :white_check_mark:
dlrm-criteoterabyte 1 35.04 35.03 0.04% :white_check_mark:
dlrm-criteoterabyte_fp16 1 58.15 58.07 0.15% :white_check_mark:
agentmodel 1 8,152.62 9,682.04 -15.80% :red_circle:
unet_fp16 2 58.01 57.91 0.17% :white_check_mark:
resnet50v1_fp16 1 911.07 913.21 -0.23% :white_check_mark:
resnet50v1_int8 1 928.46 974.05 -4.68% :red_circle:
bert_base_cased_fp16 64 1,150.97 1,151.98 -0.09% :white_check_mark:
bert_large_uncased_fp16 32 355.01 355.15 -0.04% :white_check_mark:
bert_large_fp16 1 211.94 210.49 0.69% :white_check_mark:
distilgpt2_fp16 16 2,160.74 2,155.85 0.23% :white_check_mark:
yolov5s 1 538.66 539.73 -0.20% :white_check_mark:
tinyllama 1 43.39 43.50 -0.25% :white_check_mark:
vicuna-fastchat 1 171.87 180.22 -4.63% :red_circle:
whisper-tiny-encoder 1 409.45 410.60 -0.28% :white_check_mark:
whisper-tiny-decoder 1 424.24 433.80 -2.20% :white_check_mark:

This build is not recommended to merge :red_circle:

migraphx-bot avatar Sep 10 '24 02:09 migraphx-bot


     :white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert-mrpc-tf: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance
     :white_check_mark: torchvision-resnet50_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-dpn92_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-resnext101_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance
     :white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance
     :white_check_mark: unet: PASSED: MIGraphX meets tolerance
     :white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

     :white_check_mark: bert_large: PASSED: MIGraphX meets tolerance
     :white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance
     :white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance
     :white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

migraphx-bot avatar Sep 10 '24 02:09 migraphx-bot

Late followup: I repeated the perf test for resnet50v1_int8 on server XXXXX with both this branch (commit 57b37f6c4) and previous develop (commit 990071c73). There is still a significant speed deficit with this branch.

stride_ordering_for_mlir Rate: 1313.84 inferences/sec
develop                  Rate: 1412.36 inferences/sec

bpickrel avatar Sep 11 '24 21:09 bpickrel