AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Warnings upon tuning information mismatch for Convolutions

Open umangyadav opened this issue 3 years ago • 6 comments

This PR adds MIGRAPHX_ENABLE_TUNING_WARNINGS environment flag. Based on this flag it would warn/error out if saved Tuning information has a different version, and architecture setup compared to the machine on which the model is being run.

It warns on the following cases :

  1. MIOpen version mismatch. It relies on miopenStatusVersionMismatch return code from calling loadSolution of the MIOpen.
  2. GFX Arch and CU count mismatch for the compiled and saved code.
  3. MIGraphX version mismatch

umangyadav avatar Oct 19 '22 19:10 umangyadav

Test Rate new
4b4566
Rate old
7d6227
Diff Compare
torchvision-resnet50 2,246.84 2,246.48 0.02% :white_check_mark:
torchvision-resnet50_fp16 4,866.33 4,867.40 -0.02% :white_check_mark:
torchvision-alexnet 4,975.74 4,978.41 -0.05% :red_circle:
torchvision-alexnet_fp16 25,423.36 25,334.39 0.35% :high_brightness:
torchvision-densenet121 1,817.19 1,812.31 0.27% :high_brightness:
torchvision-densenet121_fp16 3,262.29 3,265.52 -0.10% :white_check_mark:
torchvision-inceptionv3 1,145.78 1,145.55 0.02% :white_check_mark:
torchvision-inceptionv3_fp16 2,057.41 2,052.23 0.25% :white_check_mark:
torchvision-vgg16 895.83 895.71 0.01% :white_check_mark:
torchvision-vgg16_fp16 1,742.92 1,743.37 -0.03% :white_check_mark:
cadene-inceptionv4 540.47 540.56 -0.02% :white_check_mark:
cadene-resnext64x4 579.73 579.55 0.03% :high_brightness:
slim-mobilenet 7,166.68 7,168.35 -0.02% :white_check_mark:
slim-nasnetalarge 220.08 220.22 -0.07% :white_check_mark:
slim-resnet50v2 2,341.93 2,338.65 0.14% :high_brightness:
bert-mrpc-onnx 807.67 807.10 0.07% :high_brightness:
bert-mrpc-tf 302.54 302.53 0.00% :white_check_mark:
pytorch-examples-wlang-gru 321.06 310.47 3.41% :high_brightness:
pytorch-examples-wlang-lstm 342.56 344.19 -0.47% :white_check_mark:
torchvision-resnet50_1 516.36 517.21 -0.17% :white_check_mark:
torchvision-inceptionv3_1 308.36 307.79 0.19% :white_check_mark:
torchvision-vgg16_1 463.06 462.89 0.04% :white_check_mark:
cadene-dpn92_1 315.18 319.73 -1.42% :red_circle:
cadene-resnext101_1 234.48 234.63 -0.06% :white_check_mark:
slim-vgg16_1 64.03 64.00 0.06% :high_brightness:
slim-mobilenet_1 1,482.46 1,480.44 0.14% :white_check_mark:
slim-inceptionv4_1 194.09 195.48 -0.71% :red_circle:
onnx-taau-downsample 295.87 295.70 0.06% :white_check_mark:

This build is not recommended to merge :red_circle:

migraphx-bot avatar Oct 19 '22 20:10 migraphx-bot

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.74%. Comparing base (7d62275) to head (d68b2fe). Report is 711 commits behind head on develop.

:exclamation: Current head d68b2fe differs from pull request most recent head 4b45667. Consider uploading reports for the commit 4b45667 to get more accurate results

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #1417   +/-   ##
========================================
  Coverage    92.74%   92.74%           
========================================
  Files          447      447           
  Lines        14943    14948    +5     
========================================
+ Hits         13859    13864    +5     
  Misses        1084     1084           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Oct 19 '22 21:10 codecov[bot]

Apart from CI this looks good.

Yes, i'll fix tidy.

Just to be clear, since there isn't a description You're adding better warnings for the convolution operators and based on MIOpen version, or CU core count

warning are for the cases where tuning information stored in mxr file doesn't match the user's configuration. In that case, users are required to enable tuning flag to get optimal performance.

Unit test change is to handle the extra args you added (gfx_arch, cu_count, miopen_version) from how things are serialized?

Yes, size of thecontext changes because of that.

umangyadav avatar Oct 21 '22 14:10 umangyadav

Apart from CI this looks good.

Yes, i'll fix tidy.

Just to be clear, since there isn't a description You're adding better warnings for the convolution operators and based on MIOpen version, or CU core count

warning are for the cases where tuning information stored in mxr file doesn't match the user's configuration. In that case, users are required to enable tuning flag to get optimal performance.

Unit test change is to handle the extra args you added (gfx_arch, cu_count, miopen_version) from how things are serialized?

Yes, size of thecontext changes because of that.

All good. Just approved looks good. Just wanted to make sure I understood everything here.

TedThemistokleous avatar Oct 21 '22 14:10 TedThemistokleous

We should throw an error for different gfx arch when the model has binary code objects since the model cannot run on a different arch. We will need to check this in finalize so we aren't throwing errors when they aren't applicable.

~I disagree on this one. Users should be able to share/transfer models across different gfx-arch. Erroring out would not allow that.~

We could consider adding a logger similar to MIOpen with different levels. So the warnings will show by default but we can have higher log level the user can show to enable this less important info(such as CUs or MIOpen version difference).

Logging with levels is a good idea. I can work on Logging with different levels in a separate work item. But i think for now, we should keep warnings for MIOpen version and CUs mismatch.

umangyadav avatar Oct 28 '22 22:10 umangyadav

Logging with levels is a good idea. I can work on Logging with different levels in a separate work item. But i think for now, we should keep warnings for MIOpen version and CUs mismatch.

You can still put the warnings behind an env variable. Also the warnings should probably go in finalize so it warns when MIOpen is actually used. They are completely pointless when using BERT.

pfultz2 avatar Oct 28 '22 22:10 pfultz2