AMDMIGraphX
AMDMIGraphX copied to clipboard
adjust stride ordering rules for standard shape: stride can be anythi…
…ng in a dimension of size 1. This is for issue https://github.com/ROCm/AMDMIGraphX/issues/3117
Codecov Report
Attention: Patch coverage is 83.33333% with 3 lines in your changes missing coverage. Please review.
Project coverage is 92.14%. Comparing base (
3843a96) to head (57b37f6). Report is 178 commits behind head on develop.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/program.cpp | 75.00% | 3 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## develop #3285 +/- ##
===========================================
- Coverage 92.16% 92.14% -0.02%
===========================================
Files 504 504
Lines 20486 20523 +37
===========================================
+ Hits 18880 18910 +30
- Misses 1606 1613 +7
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Confirmed it also works on MI300 for good measure, although this should have been GPU-independent.
Added @pfultz2 Paul, the question is whether it ever makes sense to define a shape as both standard and broadcast, if the 0-length dimension is on an axis we're ignoring for the standard check. The only test case we know of that has both standard and broadcast properties is a scalar.
the question is whether it ever makes sense to define a shape as both standard and broadcast
Yes it does make sense. "Standard" shape basically means that the offset and index are the same.
@bpickrel CI Failure...
[2024-07-25T00:26:40.785Z] [ RUN ] test_shape_alloc
[2024-07-25T00:26:40.785Z] test_verify: /home/jenkins/workspace/AMDMIGraphX_PR-3285/src/program.cpp:511: std::vector<argument> migraphx::generic_eval(const module *, std::vector<context> &, std::unordered_map<std::string, argument>, std::unordered_map<instruction_ref, argument>, F) [F = (lambda at /home/jenkins/workspace/AMDMIGraphX_PR-3285/src/program.cpp:530:57)]: Assertion `ins->get_shape().any_of_dynamic() or results.at(ins).get_shape() == ins->get_shape()' failed.
[2024-07-25T00:26:40.785Z] CMake Error at gdb/test_test_verify_conv/run.cmake:16 (message):
[2024-07-25T00:26:40.785Z] Test failed
Not an obvious fix to test failure--back to draft status.
Don't re-review until I get current build fails fixed.
Now seeing a test fail in this code:
/*!
* Check all shapes have the same layout.
*/
const check_shapes& same_layout() const
{
if(not this->same([](const shape& s) { return find_permutation(s); }))
MIGRAPHX_THROW(prefix() + "Layouts do not match");
return *this;
}
Took some time but replicated the fail on rocm-rome-6. Examining the output now. Output file: rocblas_fail.txt
Took some time but replicated the fail on rocm-rome-6. Examining the output now. Output file:
So the failure is in fuse_ops, which means we have a bug somewhere where it is changing the shape.
A backtrace will help find out which matcher this is happening.
Took some time but replicated the fail on rocm-rome-6. Examining the output now. Output file:
So the failure is in fuse_ops, which means we have a bug somewhere where it is changing the shape.
A backtrace will help find out which matcher this is happening.
I think I found it; just pushed for testing
| Test | Batch | Rate new 57b37f |
Rate old b6b228 |
Diff | Compare |
|---|---|---|---|---|---|
| torchvision-resnet50 | 64 | 3,245.27 | 3,245.65 | -0.01% | :white_check_mark: |
| torchvision-resnet50_fp16 | 64 | 6,983.90 | 6,977.99 | 0.08% | :white_check_mark: |
| torchvision-densenet121 | 32 | 2,429.29 | 2,431.17 | -0.08% | :white_check_mark: |
| torchvision-densenet121_fp16 | 32 | 4,076.21 | 4,084.44 | -0.20% | :white_check_mark: |
| torchvision-inceptionv3 | 32 | 1,633.66 | 1,632.74 | 0.06% | :white_check_mark: |
| torchvision-inceptionv3_fp16 | 32 | 2,734.93 | 2,736.68 | -0.06% | :white_check_mark: |
| cadene-inceptionv4 | 16 | 775.84 | 775.89 | -0.01% | :white_check_mark: |
| cadene-resnext64x4 | 16 | 807.90 | 807.90 | 0.00% | :white_check_mark: |
| slim-mobilenet | 64 | 7,449.91 | 7,451.79 | -0.03% | :white_check_mark: |
| slim-nasnetalarge | 64 | 207.54 | 207.50 | 0.02% | :white_check_mark: |
| slim-resnet50v2 | 64 | 3,340.07 | 3,339.16 | 0.03% | :white_check_mark: |
| bert-mrpc-onnx | 8 | 1,152.85 | 1,149.07 | 0.33% | :white_check_mark: |
| bert-mrpc-tf | 1 | 305.81 | 315.13 | -2.96% | :white_check_mark: |
| pytorch-examples-wlang-gru | 1 | 426.40 | 407.88 | 4.54% | :high_brightness: |
| pytorch-examples-wlang-lstm | 1 | 385.33 | 494.80 | -22.13% | :red_circle: |
| torchvision-resnet50_1 | 1 | 793.00 | 804.44 | -1.42% | :white_check_mark: |
| cadene-dpn92_1 | 1 | 398.86 | 403.57 | -1.17% | :white_check_mark: |
| cadene-resnext101_1 | 1 | 381.38 | 382.47 | -0.28% | :white_check_mark: |
| onnx-taau-downsample | 1 | 345.79 | 344.14 | 0.48% | :white_check_mark: |
| dlrm-criteoterabyte | 1 | 35.04 | 35.03 | 0.04% | :white_check_mark: |
| dlrm-criteoterabyte_fp16 | 1 | 58.15 | 58.07 | 0.15% | :white_check_mark: |
| agentmodel | 1 | 8,152.62 | 9,682.04 | -15.80% | :red_circle: |
| unet_fp16 | 2 | 58.01 | 57.91 | 0.17% | :white_check_mark: |
| resnet50v1_fp16 | 1 | 911.07 | 913.21 | -0.23% | :white_check_mark: |
| resnet50v1_int8 | 1 | 928.46 | 974.05 | -4.68% | :red_circle: |
| bert_base_cased_fp16 | 64 | 1,150.97 | 1,151.98 | -0.09% | :white_check_mark: |
| bert_large_uncased_fp16 | 32 | 355.01 | 355.15 | -0.04% | :white_check_mark: |
| bert_large_fp16 | 1 | 211.94 | 210.49 | 0.69% | :white_check_mark: |
| distilgpt2_fp16 | 16 | 2,160.74 | 2,155.85 | 0.23% | :white_check_mark: |
| yolov5s | 1 | 538.66 | 539.73 | -0.20% | :white_check_mark: |
| tinyllama | 1 | 43.39 | 43.50 | -0.25% | :white_check_mark: |
| vicuna-fastchat | 1 | 171.87 | 180.22 | -4.63% | :red_circle: |
| whisper-tiny-encoder | 1 | 409.45 | 410.60 | -0.28% | :white_check_mark: |
| whisper-tiny-decoder | 1 | 424.24 | 433.80 | -2.20% | :white_check_mark: |
This build is not recommended to merge :red_circle:
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output
Late followup: I repeated the perf test for resnet50v1_int8 on server XXXXX with both this branch (commit 57b37f6c4) and previous develop (commit 990071c73). There is still a significant speed deficit with this branch.
stride_ordering_for_mlir Rate: 1313.84 inferences/sec
develop Rate: 1412.36 inferences/sec