Warnings upon tuning information mismatch for Convolutions
This PR adds MIGRAPHX_ENABLE_TUNING_WARNINGS environment flag.
Based on this flag it would warn/error out if saved Tuning information has a different version, and architecture setup compared to the machine on which the model is being run.
It warns on the following cases :
- MIOpen version mismatch. It relies on
miopenStatusVersionMismatchreturn code from callingloadSolutionof the MIOpen. - GFX Arch and CU count mismatch for the compiled and saved code.
- MIGraphX version mismatch
| Test | Rate new 4b4566 |
Rate old 7d6227 |
Diff | Compare |
|---|---|---|---|---|
| torchvision-resnet50 | 2,246.84 | 2,246.48 | 0.02% | :white_check_mark: |
| torchvision-resnet50_fp16 | 4,866.33 | 4,867.40 | -0.02% | :white_check_mark: |
| torchvision-alexnet | 4,975.74 | 4,978.41 | -0.05% | :red_circle: |
| torchvision-alexnet_fp16 | 25,423.36 | 25,334.39 | 0.35% | :high_brightness: |
| torchvision-densenet121 | 1,817.19 | 1,812.31 | 0.27% | :high_brightness: |
| torchvision-densenet121_fp16 | 3,262.29 | 3,265.52 | -0.10% | :white_check_mark: |
| torchvision-inceptionv3 | 1,145.78 | 1,145.55 | 0.02% | :white_check_mark: |
| torchvision-inceptionv3_fp16 | 2,057.41 | 2,052.23 | 0.25% | :white_check_mark: |
| torchvision-vgg16 | 895.83 | 895.71 | 0.01% | :white_check_mark: |
| torchvision-vgg16_fp16 | 1,742.92 | 1,743.37 | -0.03% | :white_check_mark: |
| cadene-inceptionv4 | 540.47 | 540.56 | -0.02% | :white_check_mark: |
| cadene-resnext64x4 | 579.73 | 579.55 | 0.03% | :high_brightness: |
| slim-mobilenet | 7,166.68 | 7,168.35 | -0.02% | :white_check_mark: |
| slim-nasnetalarge | 220.08 | 220.22 | -0.07% | :white_check_mark: |
| slim-resnet50v2 | 2,341.93 | 2,338.65 | 0.14% | :high_brightness: |
| bert-mrpc-onnx | 807.67 | 807.10 | 0.07% | :high_brightness: |
| bert-mrpc-tf | 302.54 | 302.53 | 0.00% | :white_check_mark: |
| pytorch-examples-wlang-gru | 321.06 | 310.47 | 3.41% | :high_brightness: |
| pytorch-examples-wlang-lstm | 342.56 | 344.19 | -0.47% | :white_check_mark: |
| torchvision-resnet50_1 | 516.36 | 517.21 | -0.17% | :white_check_mark: |
| torchvision-inceptionv3_1 | 308.36 | 307.79 | 0.19% | :white_check_mark: |
| torchvision-vgg16_1 | 463.06 | 462.89 | 0.04% | :white_check_mark: |
| cadene-dpn92_1 | 315.18 | 319.73 | -1.42% | :red_circle: |
| cadene-resnext101_1 | 234.48 | 234.63 | -0.06% | :white_check_mark: |
| slim-vgg16_1 | 64.03 | 64.00 | 0.06% | :high_brightness: |
| slim-mobilenet_1 | 1,482.46 | 1,480.44 | 0.14% | :white_check_mark: |
| slim-inceptionv4_1 | 194.09 | 195.48 | -0.71% | :red_circle: |
| onnx-taau-downsample | 295.87 | 295.70 | 0.06% | :white_check_mark: |
This build is not recommended to merge :red_circle:
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 92.74%. Comparing base (
7d62275) to head (d68b2fe). Report is 711 commits behind head on develop.
:exclamation: Current head d68b2fe differs from pull request most recent head 4b45667. Consider uploading reports for the commit 4b45667 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## develop #1417 +/- ##
========================================
Coverage 92.74% 92.74%
========================================
Files 447 447
Lines 14943 14948 +5
========================================
+ Hits 13859 13864 +5
Misses 1084 1084
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Apart from CI this looks good.
Yes, i'll fix tidy.
Just to be clear, since there isn't a description You're adding better warnings for the convolution operators and based on MIOpen version, or CU core count
warning are for the cases where tuning information stored in mxr file doesn't match the user's configuration. In that case, users are required to enable tuning flag to get optimal performance.
Unit test change is to handle the extra args you added (gfx_arch, cu_count, miopen_version) from how things are serialized?
Yes, size of thecontext changes because of that.
Apart from CI this looks good.
Yes, i'll fix tidy.
Just to be clear, since there isn't a description You're adding better warnings for the convolution operators and based on MIOpen version, or CU core count
warning are for the cases where tuning information stored in mxr file doesn't match the user's configuration. In that case, users are required to enable tuning flag to get optimal performance.
Unit test change is to handle the extra args you added (gfx_arch, cu_count, miopen_version) from how things are serialized?
Yes, size of the
contextchanges because of that.
All good. Just approved looks good. Just wanted to make sure I understood everything here.
We should throw an error for different gfx arch when the model has binary code objects since the model cannot run on a different arch. We will need to check this in
finalizeso we aren't throwing errors when they aren't applicable.
~I disagree on this one. Users should be able to share/transfer models across different gfx-arch. Erroring out would not allow that.~
We could consider adding a logger similar to MIOpen with different levels. So the warnings will show by default but we can have higher log level the user can show to enable this less important info(such as CUs or MIOpen version difference).
Logging with levels is a good idea. I can work on Logging with different levels in a separate work item. But i think for now, we should keep warnings for MIOpen version and CUs mismatch.
Logging with levels is a good idea. I can work on Logging with different levels in a separate work item. But i think for now, we should keep warnings for MIOpen version and CUs mismatch.
You can still put the warnings behind an env variable. Also the warnings should probably go in finalize so it warns when MIOpen is actually used. They are completely pointless when using BERT.