Preload tiles into LDS to improve performance of pointwise transposes
Fixes #3172.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 92.02%. Comparing base (
e230c02) to head (289067b). Report is 161 commits behind head on develop.
Additional details and impacted files
@@ Coverage Diff @@
## develop #3362 +/- ##
===========================================
- Coverage 92.04% 92.02% -0.03%
===========================================
Files 506 509 +3
Lines 20872 21005 +133
===========================================
+ Hits 19212 19330 +118
- Misses 1660 1675 +15
| Flag | Coverage Δ | |
|---|---|---|
92.02% <100.00%> (-0.03%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@kahmed10 there's some other failures in CI with this one.
Seems to be failing test_spacetodepth_example_cpu from ONNX backend. The test says CPU but the compiled program looks to be using the GPU...
@pfultz2 Looks like in CI its still failing but on pooling for the all targets build in jenkins.
] 354/382 Test #365: test_api_custom_op_gpu ....................................................***Exception: SegFault 1.84 sec
Is this related to your remove padding from pooling PR #3423 ? Looks like some failures themselves with pooling
| Test | Batch | Rate new 4a20e6 |
Rate old 1ab830 |
Diff | Compare |
|---|---|---|---|---|---|
| torchvision-resnet50 | 64 | 3,249.19 | 3,257.25 | -0.25% | :white_check_mark: |
| torchvision-resnet50_fp16 | 64 | 6,991.94 | 6,998.54 | -0.09% | :white_check_mark: |
| torchvision-densenet121 | 32 | 2,431.98 | 2,431.43 | 0.02% | :white_check_mark: |
| torchvision-densenet121_fp16 | 32 | 4,097.00 | 4,095.87 | 0.03% | :white_check_mark: |
| torchvision-inceptionv3 | 32 | 1,638.50 | 1,637.26 | 0.08% | :white_check_mark: |
| torchvision-inceptionv3_fp16 | 32 | 2,744.19 | 2,742.87 | 0.05% | :white_check_mark: |
| cadene-inceptionv4 | 16 | 779.12 | 779.29 | -0.02% | :white_check_mark: |
| cadene-resnext64x4 | 16 | 808.38 | 807.81 | 0.07% | :white_check_mark: |
| slim-mobilenet | 64 | 7,456.72 | 7,455.44 | 0.02% | :white_check_mark: |
| slim-nasnetalarge | 64 | 208.18 | 208.13 | 0.02% | :white_check_mark: |
| slim-resnet50v2 | 64 | 3,435.43 | 3,441.14 | -0.17% | :white_check_mark: |
| bert-mrpc-onnx | 8 | 1,148.21 | 1,155.05 | -0.59% | :white_check_mark: |
| bert-mrpc-tf | 1 | 306.87 | 317.57 | -3.37% | :red_circle: |
| pytorch-examples-wlang-gru | 1 | 421.54 | 386.94 | 8.94% | :high_brightness: |
| pytorch-examples-wlang-lstm | 1 | 379.34 | 381.97 | -0.69% | :white_check_mark: |
| torchvision-resnet50_1 | 1 | 772.28 | 801.56 | -3.65% | :red_circle: |
| cadene-dpn92_1 | 1 | 437.16 | 400.33 | 9.20% | :high_brightness: |
| cadene-resnext101_1 | 1 | 383.35 | 383.10 | 0.07% | :white_check_mark: |
| onnx-taau-downsample | 1 | 366.49 | 343.39 | 6.72% | :high_brightness: |
| dlrm-criteoterabyte | 1 | 35.05 | 35.03 | 0.04% | :white_check_mark: |
| dlrm-criteoterabyte_fp16 | 1 | 58.08 | 58.13 | -0.09% | :white_check_mark: |
| agentmodel | 1 | 8,111.35 | 8,052.83 | 0.73% | :white_check_mark: |
| unet_fp16 | 2 | 58.89 | 57.80 | 1.90% | :white_check_mark: |
| resnet50v1_fp16 | 1 | 927.80 | 939.96 | -1.29% | :white_check_mark: |
| resnet50v1_int8 | 1 | 947.01 | 969.54 | -2.32% | :white_check_mark: |
| bert_base_cased_fp16 | 64 | 1,153.59 | 1,172.44 | -1.61% | :white_check_mark: |
| bert_large_uncased_fp16 | 32 | 355.73 | 362.82 | -1.95% | :white_check_mark: |
| bert_large_fp16 | 1 | 210.28 | 214.00 | -1.74% | :white_check_mark: |
| distilgpt2_fp16 | 16 | 2,162.07 | 2,204.43 | -1.92% | :white_check_mark: |
| yolov5s | 1 | 546.47 | 533.64 | 2.41% | :white_check_mark: |
| tinyllama | 1 | 43.39 | 43.45 | -0.15% | :white_check_mark: |
| vicuna-fastchat | 1 | 169.47 | 168.60 | 0.51% | :white_check_mark: |
| whisper-tiny-encoder | 1 | 417.94 | 417.95 | -0.00% | :white_check_mark: |
| whisper-tiny-decoder | 1 | 435.99 | 426.15 | 2.31% | :white_check_mark: |
This build is not recommended to merge :red_circle:
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output