AMDMIGraphX
AMDMIGraphX copied to clipboard
Onnx parsers for Quantization & Dequantization: case when scales and zero point tensors of same dimension as the input tensor
Onnx parsers for Quantization Dequantization: case when scales and zero point tensors of same dimension as the input tensor: Currently these parsers try to either broadcast or multibroadcast. Also these two operators already handle scales in some inconsistent manner.
| Test | Batch | Rate new 9ddf05 |
Rate old 01c94f |
Diff | Compare |
|---|---|---|---|---|---|
| torchvision-resnet50 | 64 | 3,232.34 | 3,236.49 | -0.13% | :white_check_mark: |
| torchvision-resnet50_fp16 | 64 | 6,876.71 | 6,885.59 | -0.13% | :white_check_mark: |
| torchvision-densenet121 | 32 | 2,425.46 | 2,430.36 | -0.20% | :white_check_mark: |
| torchvision-densenet121_fp16 | 32 | 4,051.95 | 4,079.72 | -0.68% | :white_check_mark: |
| torchvision-inceptionv3 | 32 | 1,632.41 | 1,634.52 | -0.13% | :white_check_mark: |
| torchvision-inceptionv3_fp16 | 32 | 2,730.45 | 2,737.62 | -0.26% | :white_check_mark: |
| cadene-inceptionv4 | 16 | 770.82 | 770.69 | 0.02% | :white_check_mark: |
| cadene-resnext64x4 | 16 | 806.07 | 807.28 | -0.15% | :white_check_mark: |
| slim-mobilenet | 64 | 7,431.64 | 7,438.46 | -0.09% | :white_check_mark: |
| slim-nasnetalarge | 64 | 207.18 | 207.42 | -0.12% | :white_check_mark: |
| slim-resnet50v2 | 64 | 3,333.61 | 3,340.22 | -0.20% | :white_check_mark: |
| bert-mrpc-onnx | 8 | 1,154.11 | 1,149.04 | 0.44% | :white_check_mark: |
| bert-mrpc-tf | 1 | 309.37 | 311.11 | -0.56% | :white_check_mark: |
| pytorch-examples-wlang-gru | 1 | 417.93 | 431.85 | -3.22% | :red_circle: |
| pytorch-examples-wlang-lstm | 1 | 384.33 | 386.06 | -0.45% | :white_check_mark: |
| torchvision-resnet50_1 | 1 | 806.37 | 801.13 | 0.65% | :white_check_mark: |
| cadene-dpn92_1 | 1 | 433.65 | 399.05 | 8.67% | :high_brightness: |
| cadene-resnext101_1 | 1 | 378.70 | 376.62 | 0.55% | :white_check_mark: |
| onnx-taau-downsample | 1 | 345.15 | 344.52 | 0.18% | :white_check_mark: |
| dlrm-criteoterabyte | 1 | 35.01 | 35.06 | -0.15% | :white_check_mark: |
| dlrm-criteoterabyte_fp16 | 1 | 57.32 | 57.35 | -0.05% | :white_check_mark: |
| agentmodel | 1 | 9,638.09 | 7,988.11 | 20.66% | :high_brightness: |
| unet_fp16 | 2 | 57.75 | 57.75 | -0.00% | :white_check_mark: |
| resnet50v1_fp16 | 1 | 910.92 | 932.17 | -2.28% | :white_check_mark: |
| resnet50v1_int8 | 1 | 932.95 | 947.12 | -1.50% | :white_check_mark: |
| bert_base_cased_fp16 | 64 | 1,139.70 | 1,141.18 | -0.13% | :white_check_mark: |
| bert_large_uncased_fp16 | 32 | 350.00 | 351.92 | -0.55% | :white_check_mark: |
| bert_large_fp16 | 1 | 211.05 | 208.10 | 1.42% | :white_check_mark: |
| distilgpt2_fp16 | 16 | 2,146.30 | 2,154.47 | -0.38% | :white_check_mark: |
| yolov5s | 1 | 506.11 | 504.76 | 0.27% | :white_check_mark: |
| tinyllama | 1 | 43.32 | 43.37 | -0.13% | :white_check_mark: |
| vicuna-fastchat | 1 | 173.19 | 178.11 | -2.76% | :white_check_mark: |
| whisper-tiny-encoder | 1 | 410.06 | 410.98 | -0.22% | :white_check_mark: |
| whisper-tiny-decoder | 1 | 430.68 | 421.85 | 2.09% | :white_check_mark: |
This build is not recommended to merge :red_circle:
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output
case when scales and zero point tensors of same dimension as the input tensor:
I dont think the scales and zero points are ever the same shape as the input:
- Per-tensor: The scales and zero points are scalars, so we insert a simple broadcast
- Per-axis quantization: The scales and zero points are a 1-D tensor that will be broadcasts across the
axis(which is defined as an attribute in the ONNX operator, see here). Since its 1-D we need to know the axis(we cant use a multibroadcast here). - Blocked quantization: The scales and zero points have the same rank but different dimensions than the input. The
axisof the dimension that is different than the input should be set to theblock_size(which bothaxisandblock_sizeare attributes define in the ONNX operator, see here).
So I dont see how they could ever be the same shape.
- Blocked quantization: The scales and zero points have the same rank but different dimensions than the input. The
axisof the dimension that is different than the input should be set to theblock_size(which bothaxisandblock_sizeare attributes define in the ONNX operator, see here).
@pfultz2, I am trying to work with an int4 model while blocked quantization is not implemented. And assume block_size = 1. And thus in simple case here, the scale tensor is the same shape as the input weights.
Code changes look fine, should have ONNX parse tests to verify what it does.
There are tests being added to another PR, along with int4 type tensors in it. If you like I can add one separately, just let me know, thanks.
(BTW, this case is just a degenerate case of where no broadcast or multicast is required for scales or zero_points.)
- Blocked quantization: The scales and zero points have the same rank but different dimensions than the input. The
axisof the dimension that is different than the input should be set to theblock_size(which bothaxisandblock_sizeare attributes define in the ONNX operator, see here).@pfultz2, I am trying to work with an int4 model while blocked quantization is not implemented. And assume
block_size = 1. And thus in simple case here, the scale tensor is the same shape as the input weights.
#3412 already handles block quantization for any block_size.
#3412 already handles block quantization for any block_size.
'Already'? As in back to the future? :-)
Code changes look fine, should have ONNX parse tests to verify what it does.
There are tests being added to another PR, along with int4 type tensors in it. If you like I can add one separately, just let me know, thanks.
(BTW, this case is just a degenerate case of where no broadcast or multicast is required for scales or zero_points.)
Would be better to have a test for this degenerate case, but not essentially right now. Approving
Would be better to have a test for this degenerate case, but not essentially right now. Approving
Thanks for approving, Charlie. The test graph for the degenerate case has actually already been added in a different PR.
I dont think this should be merged in. #3412 already handles this case, and if we merge this PR in, its just going to cause merge conflicts for #3412. I dont think this make sense to have all the extra churn for a "degenerate" case which is not used for any models.
Thanks, @pfultz2, yes at this point we don't want the churn. I would have appreciated your early approval. This work was done 3 weeks ago, and now wasted.
Closing this one. This work has been subsequently done in 3412.