AMDMIGraphX adjust stride ordering rules for standard shape: stride can be anythi…

…ng in a dimension of size 1. This is for issue https://github.com/ROCm/AMDMIGraphX/issues/3117

Jul 17 '24 22:07 bpickrel

Codecov Report

Attention: Patch coverage is 83.33333% with 3 lines in your changes missing coverage. Please review.

Project coverage is 92.14%. Comparing base (3843a96) to head (57b37f6). Report is 178 commits behind head on develop.

Files with missing lines	Patch %	Lines
src/program.cpp	75.00%	3 Missing :warning:

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3285      +/-   ##
===========================================
- Coverage    92.16%   92.14%   -0.02%     
===========================================
  Files          504      504              
  Lines        20486    20523      +37     
===========================================
+ Hits         18880    18910      +30     
- Misses        1606     1613       +7

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jul 18 '24 15:07 codecov[bot]

Confirmed it also works on MI300 for good measure, although this should have been GPU-independent.

Jul 19 '24 19:07 bpickrel

Added @pfultz2 Paul, the question is whether it ever makes sense to define a shape as both standard and broadcast, if the 0-length dimension is on an axis we're ignoring for the standard check. The only test case we know of that has both standard and broadcast properties is a scalar.

Jul 22 '24 23:07 bpickrel

the question is whether it ever makes sense to define a shape as both standard and broadcast

Yes it does make sense. "Standard" shape basically means that the offset and index are the same.

Jul 23 '24 15:07 pfultz2

@bpickrel CI Failure...

[2024-07-25T00:26:40.785Z] [   RUN    ] test_shape_alloc
[2024-07-25T00:26:40.785Z] test_verify: /home/jenkins/workspace/AMDMIGraphX_PR-3285/src/program.cpp:511: std::vector<argument> migraphx::generic_eval(const module *, std::vector<context> &, std::unordered_map<std::string, argument>, std::unordered_map<instruction_ref, argument>, F) [F = (lambda at /home/jenkins/workspace/AMDMIGraphX_PR-3285/src/program.cpp:530:57)]: Assertion `ins->get_shape().any_of_dynamic() or results.at(ins).get_shape() == ins->get_shape()' failed.

[2024-07-25T00:26:40.785Z] CMake Error at gdb/test_test_verify_conv/run.cmake:16 (message):
[2024-07-25T00:26:40.785Z]   Test failed

Jul 26 '24 15:07 causten

Not an obvious fix to test failure--back to draft status.

Jul 26 '24 23:07 bpickrel

Don't re-review until I get current build fails fixed.

Jul 31 '24 19:07 bpickrel

Now seeing a test fail in this code:

    /*!
     * Check all shapes have the same layout.
     */
    const check_shapes& same_layout() const
    {
        if(not this->same([](const shape& s) { return find_permutation(s); }))
            MIGRAPHX_THROW(prefix() + "Layouts do not match");
        return *this;
    }

Aug 06 '24 23:08 bpickrel

Took some time but replicated the fail on rocm-rome-6. Examining the output now. Output file: rocblas_fail.txt

Sep 09 '24 22:09 bpickrel

Took some time but replicated the fail on rocm-rome-6. Examining the output now. Output file:

So the failure is in fuse_ops, which means we have a bug somewhere where it is changing the shape.

A backtrace will help find out which matcher this is happening.

Sep 09 '24 22:09 pfultz2

Took some time but replicated the fail on rocm-rome-6. Examining the output now. Output file:

So the failure is in fuse_ops, which means we have a bug somewhere where it is changing the shape.

A backtrace will help find out which matcher this is happening.

I think I found it; just pushed for testing

Sep 09 '24 23:09 bpickrel

Test Batch Rate new
57b37f Rate old
b6b228 Diff Compare

torchvision-resnet50 64 3,245.27 3,245.65 -0.01% :white_check_mark:

torchvision-resnet50_fp16 64 6,983.90 6,977.99 0.08% :white_check_mark:

torchvision-densenet121 32 2,429.29 2,431.17 -0.08% :white_check_mark:

torchvision-densenet121_fp16 32 4,076.21 4,084.44 -0.20% :white_check_mark:

torchvision-inceptionv3 32 1,633.66 1,632.74 0.06% :white_check_mark:

torchvision-inceptionv3_fp16 32 2,734.93 2,736.68 -0.06% :white_check_mark:

cadene-inceptionv4 16 775.84 775.89 -0.01% :white_check_mark:

cadene-resnext64x4 16 807.90 807.90 0.00% :white_check_mark:

slim-mobilenet 64 7,449.91 7,451.79 -0.03% :white_check_mark:

slim-nasnetalarge 64 207.54 207.50 0.02% :white_check_mark:

slim-resnet50v2 64 3,340.07 3,339.16 0.03% :white_check_mark:

bert-mrpc-onnx 8 1,152.85 1,149.07 0.33% :white_check_mark:

bert-mrpc-tf 1 305.81 315.13 -2.96% :white_check_mark:

pytorch-examples-wlang-gru 1 426.40 407.88 4.54% :high_brightness:

pytorch-examples-wlang-lstm 1 385.33 494.80 -22.13% :red_circle:

torchvision-resnet50_1 1 793.00 804.44 -1.42% :white_check_mark:

cadene-dpn92_1 1 398.86 403.57 -1.17% :white_check_mark:

cadene-resnext101_1 1 381.38 382.47 -0.28% :white_check_mark:

onnx-taau-downsample 1 345.79 344.14 0.48% :white_check_mark:

dlrm-criteoterabyte 1 35.04 35.03 0.04% :white_check_mark:

dlrm-criteoterabyte_fp16 1 58.15 58.07 0.15% :white_check_mark:

agentmodel 1 8,152.62 9,682.04 -15.80% :red_circle:

unet_fp16 2 58.01 57.91 0.17% :white_check_mark:

resnet50v1_fp16 1 911.07 913.21 -0.23% :white_check_mark:

resnet50v1_int8 1 928.46 974.05 -4.68% :red_circle:

bert_base_cased_fp16 64 1,150.97 1,151.98 -0.09% :white_check_mark:

bert_large_uncased_fp16 32 355.01 355.15 -0.04% :white_check_mark:

bert_large_fp16 1 211.94 210.49 0.69% :white_check_mark:

distilgpt2_fp16 16 2,160.74 2,155.85 0.23% :white_check_mark:

yolov5s 1 538.66 539.73 -0.20% :white_check_mark:

tinyllama 1 43.39 43.50 -0.25% :white_check_mark:

vicuna-fastchat 1 171.87 180.22 -4.63% :red_circle:

whisper-tiny-encoder 1 409.45 410.60 -0.28% :white_check_mark:

whisper-tiny-decoder 1 424.24 433.80 -2.20% :white_check_mark:

Test	Batch	Rate new 57b37f	Rate old b6b228	Diff	Compare
torchvision-resnet50	64	3,245.27	3,245.65	-0.01%	:white_check_mark:
torchvision-resnet50_fp16	64	6,983.90	6,977.99	0.08%	:white_check_mark:
torchvision-densenet121	32	2,429.29	2,431.17	-0.08%	:white_check_mark:
torchvision-densenet121_fp16	32	4,076.21	4,084.44	-0.20%	:white_check_mark:
torchvision-inceptionv3	32	1,633.66	1,632.74	0.06%	:white_check_mark:
torchvision-inceptionv3_fp16	32	2,734.93	2,736.68	-0.06%	:white_check_mark:
cadene-inceptionv4	16	775.84	775.89	-0.01%	:white_check_mark:
cadene-resnext64x4	16	807.90	807.90	0.00%	:white_check_mark:
slim-mobilenet	64	7,449.91	7,451.79	-0.03%	:white_check_mark:
slim-nasnetalarge	64	207.54	207.50	0.02%	:white_check_mark:
slim-resnet50v2	64	3,340.07	3,339.16	0.03%	:white_check_mark:
bert-mrpc-onnx	8	1,152.85	1,149.07	0.33%	:white_check_mark:
bert-mrpc-tf	1	305.81	315.13	-2.96%	:white_check_mark:
pytorch-examples-wlang-gru	1	426.40	407.88	4.54%	:high_brightness:
pytorch-examples-wlang-lstm	1	385.33	494.80	-22.13%	:red_circle:
torchvision-resnet50_1	1	793.00	804.44	-1.42%	:white_check_mark:
cadene-dpn92_1	1	398.86	403.57	-1.17%	:white_check_mark:
cadene-resnext101_1	1	381.38	382.47	-0.28%	:white_check_mark:
onnx-taau-downsample	1	345.79	344.14	0.48%	:white_check_mark:
dlrm-criteoterabyte	1	35.04	35.03	0.04%	:white_check_mark:
dlrm-criteoterabyte_fp16	1	58.15	58.07	0.15%	:white_check_mark:
agentmodel	1	8,152.62	9,682.04	-15.80%	:red_circle:
unet_fp16	2	58.01	57.91	0.17%	:white_check_mark:
resnet50v1_fp16	1	911.07	913.21	-0.23%	:white_check_mark:
resnet50v1_int8	1	928.46	974.05	-4.68%	:red_circle:
bert_base_cased_fp16	64	1,150.97	1,151.98	-0.09%	:white_check_mark:
bert_large_uncased_fp16	32	355.01	355.15	-0.04%	:white_check_mark:
bert_large_fp16	1	211.94	210.49	0.69%	:white_check_mark:
distilgpt2_fp16	16	2,160.74	2,155.85	0.23%	:white_check_mark:
yolov5s	1	538.66	539.73	-0.20%	:white_check_mark:
tinyllama	1	43.39	43.50	-0.25%	:white_check_mark:
vicuna-fastchat	1	171.87	180.22	-4.63%	:red_circle:
whisper-tiny-encoder	1	409.45	410.60	-0.28%	:white_check_mark:
whisper-tiny-decoder	1	424.24	433.80	-2.20%	:white_check_mark:

This build is not recommended to merge :red_circle:

Sep 10 '24 02:09 migraphx-bot

:white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

:white_check_mark: bert-mrpc-tf: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark: torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

:white_check_mark: cadene-dpn92_1: PASSED: MIGraphX meets tolerance

:white_check_mark: cadene-resnext101_1: PASSED: MIGraphX meets tolerance

:white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

:white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance

:white_check_mark: unet: PASSED: MIGraphX meets tolerance

:white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance

:white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: bert_large: PASSED: MIGraphX meets tolerance

:white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance

:white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance

:white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

Sep 10 '24 02:09 migraphx-bot

Late followup: I repeated the perf test for resnet50v1_int8 on server XXXXX with both this branch (commit 57b37f6c4) and previous develop (commit 990071c73). There is still a significant speed deficit with this branch.

stride_ordering_for_mlir Rate: 1313.84 inferences/sec
develop                  Rate: 1412.36 inferences/sec

Sep 11 '24 21:09 bpickrel

AMDMIGraphX AMDMIGraphX copied to clipboard

adjust stride ordering rules for standard shape: stride can be anythi…

Codecov Report

AMDMIGraphX
AMDMIGraphX copied to clipboard