AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Onnx parsers for Quantization & Dequantization: case when scales and zero point tensors of same dimension as the input tensor

Open lakhinderwalia opened this issue 1 year ago • 10 comments

Onnx parsers for Quantization Dequantization: case when scales and zero point tensors of same dimension as the input tensor: Currently these parsers try to either broadcast or multibroadcast. Also these two operators already handle scales in some inconsistent manner.

lakhinderwalia avatar Aug 17 '24 02:08 lakhinderwalia

Test Batch Rate new
9ddf05
Rate old
01c94f
Diff Compare
torchvision-resnet50 64 3,232.34 3,236.49 -0.13% :white_check_mark:
torchvision-resnet50_fp16 64 6,876.71 6,885.59 -0.13% :white_check_mark:
torchvision-densenet121 32 2,425.46 2,430.36 -0.20% :white_check_mark:
torchvision-densenet121_fp16 32 4,051.95 4,079.72 -0.68% :white_check_mark:
torchvision-inceptionv3 32 1,632.41 1,634.52 -0.13% :white_check_mark:
torchvision-inceptionv3_fp16 32 2,730.45 2,737.62 -0.26% :white_check_mark:
cadene-inceptionv4 16 770.82 770.69 0.02% :white_check_mark:
cadene-resnext64x4 16 806.07 807.28 -0.15% :white_check_mark:
slim-mobilenet 64 7,431.64 7,438.46 -0.09% :white_check_mark:
slim-nasnetalarge 64 207.18 207.42 -0.12% :white_check_mark:
slim-resnet50v2 64 3,333.61 3,340.22 -0.20% :white_check_mark:
bert-mrpc-onnx 8 1,154.11 1,149.04 0.44% :white_check_mark:
bert-mrpc-tf 1 309.37 311.11 -0.56% :white_check_mark:
pytorch-examples-wlang-gru 1 417.93 431.85 -3.22% :red_circle:
pytorch-examples-wlang-lstm 1 384.33 386.06 -0.45% :white_check_mark:
torchvision-resnet50_1 1 806.37 801.13 0.65% :white_check_mark:
cadene-dpn92_1 1 433.65 399.05 8.67% :high_brightness:
cadene-resnext101_1 1 378.70 376.62 0.55% :white_check_mark:
onnx-taau-downsample 1 345.15 344.52 0.18% :white_check_mark:
dlrm-criteoterabyte 1 35.01 35.06 -0.15% :white_check_mark:
dlrm-criteoterabyte_fp16 1 57.32 57.35 -0.05% :white_check_mark:
agentmodel 1 9,638.09 7,988.11 20.66% :high_brightness:
unet_fp16 2 57.75 57.75 -0.00% :white_check_mark:
resnet50v1_fp16 1 910.92 932.17 -2.28% :white_check_mark:
resnet50v1_int8 1 932.95 947.12 -1.50% :white_check_mark:
bert_base_cased_fp16 64 1,139.70 1,141.18 -0.13% :white_check_mark:
bert_large_uncased_fp16 32 350.00 351.92 -0.55% :white_check_mark:
bert_large_fp16 1 211.05 208.10 1.42% :white_check_mark:
distilgpt2_fp16 16 2,146.30 2,154.47 -0.38% :white_check_mark:
yolov5s 1 506.11 504.76 0.27% :white_check_mark:
tinyllama 1 43.32 43.37 -0.13% :white_check_mark:
vicuna-fastchat 1 173.19 178.11 -2.76% :white_check_mark:
whisper-tiny-encoder 1 410.06 410.98 -0.22% :white_check_mark:
whisper-tiny-decoder 1 430.68 421.85 2.09% :white_check_mark:

This build is not recommended to merge :red_circle:

migraphx-bot avatar Aug 17 '24 06:08 migraphx-bot


     :white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert-mrpc-tf: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance
     :white_check_mark: torchvision-resnet50_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-dpn92_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-resnext101_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance
     :white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance
     :white_check_mark: unet: PASSED: MIGraphX meets tolerance
     :white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

     :white_check_mark: bert_large: PASSED: MIGraphX meets tolerance
     :white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance
     :white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance
     :white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

migraphx-bot avatar Aug 17 '24 06:08 migraphx-bot

case when scales and zero point tensors of same dimension as the input tensor:

I dont think the scales and zero points are ever the same shape as the input:

  • Per-tensor: The scales and zero points are scalars, so we insert a simple broadcast
  • Per-axis quantization: The scales and zero points are a 1-D tensor that will be broadcasts across the axis(which is defined as an attribute in the ONNX operator, see here). Since its 1-D we need to know the axis(we cant use a multibroadcast here).
  • Blocked quantization: The scales and zero points have the same rank but different dimensions than the input. The axis of the dimension that is different than the input should be set to the block_size(which both axis and block_size are attributes define in the ONNX operator, see here).

So I dont see how they could ever be the same shape.

pfultz2 avatar Aug 19 '24 17:08 pfultz2

  • Blocked quantization: The scales and zero points have the same rank but different dimensions than the input. The axis of the dimension that is different than the input should be set to the block_size(which both axis and block_size are attributes define in the ONNX operator, see here).

@pfultz2, I am trying to work with an int4 model while blocked quantization is not implemented. And assume block_size = 1. And thus in simple case here, the scale tensor is the same shape as the input weights.

lakhinderwalia avatar Aug 19 '24 17:08 lakhinderwalia

Code changes look fine, should have ONNX parse tests to verify what it does.

There are tests being added to another PR, along with int4 type tensors in it. If you like I can add one separately, just let me know, thanks.

(BTW, this case is just a degenerate case of where no broadcast or multicast is required for scales or zero_points.)

lakhinderwalia avatar Aug 29 '24 22:08 lakhinderwalia

  • Blocked quantization: The scales and zero points have the same rank but different dimensions than the input. The axis of the dimension that is different than the input should be set to the block_size(which both axis and block_size are attributes define in the ONNX operator, see here).

@pfultz2, I am trying to work with an int4 model while blocked quantization is not implemented. And assume block_size = 1. And thus in simple case here, the scale tensor is the same shape as the input weights.

#3412 already handles block quantization for any block_size.

pfultz2 avatar Sep 04 '24 13:09 pfultz2

#3412 already handles block quantization for any block_size.

'Already'? As in back to the future? :-)

lakhinderwalia avatar Sep 04 '24 15:09 lakhinderwalia

Code changes look fine, should have ONNX parse tests to verify what it does.

There are tests being added to another PR, along with int4 type tensors in it. If you like I can add one separately, just let me know, thanks.

(BTW, this case is just a degenerate case of where no broadcast or multicast is required for scales or zero_points.)

Would be better to have a test for this degenerate case, but not essentially right now. Approving

CharlieL7 avatar Sep 05 '24 16:09 CharlieL7

Would be better to have a test for this degenerate case, but not essentially right now. Approving

Thanks for approving, Charlie. The test graph for the degenerate case has actually already been added in a different PR.

lakhinderwalia avatar Sep 05 '24 17:09 lakhinderwalia

I dont think this should be merged in. #3412 already handles this case, and if we merge this PR in, its just going to cause merge conflicts for #3412. I dont think this make sense to have all the extra churn for a "degenerate" case which is not used for any models.

Thanks, @pfultz2, yes at this point we don't want the churn. I would have appreciated your early approval. This work was done 3 weeks ago, and now wasted.

lakhinderwalia avatar Sep 05 '24 17:09 lakhinderwalia

Closing this one. This work has been subsequently done in 3412.

lakhinderwalia avatar Oct 01 '24 18:10 lakhinderwalia