AMDMIGraphX Onnx parsers for Quantization & Dequantization: case when scales and zero point tensors of same dimension as the input tensor

Onnx parsers for Quantization Dequantization: case when scales and zero point tensors of same dimension as the input tensor: Currently these parsers try to either broadcast or multibroadcast. Also these two operators already handle scales in some inconsistent manner.

Aug 17 '24 02:08 lakhinderwalia

Test Batch Rate new
9ddf05 Rate old
01c94f Diff Compare

torchvision-resnet50 64 3,232.34 3,236.49 -0.13% :white_check_mark:

torchvision-resnet50_fp16 64 6,876.71 6,885.59 -0.13% :white_check_mark:

torchvision-densenet121 32 2,425.46 2,430.36 -0.20% :white_check_mark:

torchvision-densenet121_fp16 32 4,051.95 4,079.72 -0.68% :white_check_mark:

torchvision-inceptionv3 32 1,632.41 1,634.52 -0.13% :white_check_mark:

torchvision-inceptionv3_fp16 32 2,730.45 2,737.62 -0.26% :white_check_mark:

cadene-inceptionv4 16 770.82 770.69 0.02% :white_check_mark:

cadene-resnext64x4 16 806.07 807.28 -0.15% :white_check_mark:

slim-mobilenet 64 7,431.64 7,438.46 -0.09% :white_check_mark:

slim-nasnetalarge 64 207.18 207.42 -0.12% :white_check_mark:

slim-resnet50v2 64 3,333.61 3,340.22 -0.20% :white_check_mark:

bert-mrpc-onnx 8 1,154.11 1,149.04 0.44% :white_check_mark:

bert-mrpc-tf 1 309.37 311.11 -0.56% :white_check_mark:

pytorch-examples-wlang-gru 1 417.93 431.85 -3.22% :red_circle:

pytorch-examples-wlang-lstm 1 384.33 386.06 -0.45% :white_check_mark:

torchvision-resnet50_1 1 806.37 801.13 0.65% :white_check_mark:

cadene-dpn92_1 1 433.65 399.05 8.67% :high_brightness:

cadene-resnext101_1 1 378.70 376.62 0.55% :white_check_mark:

onnx-taau-downsample 1 345.15 344.52 0.18% :white_check_mark:

dlrm-criteoterabyte 1 35.01 35.06 -0.15% :white_check_mark:

dlrm-criteoterabyte_fp16 1 57.32 57.35 -0.05% :white_check_mark:

agentmodel 1 9,638.09 7,988.11 20.66% :high_brightness:

unet_fp16 2 57.75 57.75 -0.00% :white_check_mark:

resnet50v1_fp16 1 910.92 932.17 -2.28% :white_check_mark:

resnet50v1_int8 1 932.95 947.12 -1.50% :white_check_mark:

bert_base_cased_fp16 64 1,139.70 1,141.18 -0.13% :white_check_mark:

bert_large_uncased_fp16 32 350.00 351.92 -0.55% :white_check_mark:

bert_large_fp16 1 211.05 208.10 1.42% :white_check_mark:

distilgpt2_fp16 16 2,146.30 2,154.47 -0.38% :white_check_mark:

yolov5s 1 506.11 504.76 0.27% :white_check_mark:

tinyllama 1 43.32 43.37 -0.13% :white_check_mark:

vicuna-fastchat 1 173.19 178.11 -2.76% :white_check_mark:

whisper-tiny-encoder 1 410.06 410.98 -0.22% :white_check_mark:

whisper-tiny-decoder 1 430.68 421.85 2.09% :white_check_mark:

Test	Batch	Rate new 9ddf05	Rate old 01c94f	Diff	Compare
torchvision-resnet50	64	3,232.34	3,236.49	-0.13%	:white_check_mark:
torchvision-resnet50_fp16	64	6,876.71	6,885.59	-0.13%	:white_check_mark:
torchvision-densenet121	32	2,425.46	2,430.36	-0.20%	:white_check_mark:
torchvision-densenet121_fp16	32	4,051.95	4,079.72	-0.68%	:white_check_mark:
torchvision-inceptionv3	32	1,632.41	1,634.52	-0.13%	:white_check_mark:
torchvision-inceptionv3_fp16	32	2,730.45	2,737.62	-0.26%	:white_check_mark:
cadene-inceptionv4	16	770.82	770.69	0.02%	:white_check_mark:
cadene-resnext64x4	16	806.07	807.28	-0.15%	:white_check_mark:
slim-mobilenet	64	7,431.64	7,438.46	-0.09%	:white_check_mark:
slim-nasnetalarge	64	207.18	207.42	-0.12%	:white_check_mark:
slim-resnet50v2	64	3,333.61	3,340.22	-0.20%	:white_check_mark:
bert-mrpc-onnx	8	1,154.11	1,149.04	0.44%	:white_check_mark:
bert-mrpc-tf	1	309.37	311.11	-0.56%	:white_check_mark:
pytorch-examples-wlang-gru	1	417.93	431.85	-3.22%	:red_circle:
pytorch-examples-wlang-lstm	1	384.33	386.06	-0.45%	:white_check_mark:
torchvision-resnet50_1	1	806.37	801.13	0.65%	:white_check_mark:
cadene-dpn92_1	1	433.65	399.05	8.67%	:high_brightness:
cadene-resnext101_1	1	378.70	376.62	0.55%	:white_check_mark:
onnx-taau-downsample	1	345.15	344.52	0.18%	:white_check_mark:
dlrm-criteoterabyte	1	35.01	35.06	-0.15%	:white_check_mark:
dlrm-criteoterabyte_fp16	1	57.32	57.35	-0.05%	:white_check_mark:
agentmodel	1	9,638.09	7,988.11	20.66%	:high_brightness:
unet_fp16	2	57.75	57.75	-0.00%	:white_check_mark:
resnet50v1_fp16	1	910.92	932.17	-2.28%	:white_check_mark:
resnet50v1_int8	1	932.95	947.12	-1.50%	:white_check_mark:
bert_base_cased_fp16	64	1,139.70	1,141.18	-0.13%	:white_check_mark:
bert_large_uncased_fp16	32	350.00	351.92	-0.55%	:white_check_mark:
bert_large_fp16	1	211.05	208.10	1.42%	:white_check_mark:
distilgpt2_fp16	16	2,146.30	2,154.47	-0.38%	:white_check_mark:
yolov5s	1	506.11	504.76	0.27%	:white_check_mark:
tinyllama	1	43.32	43.37	-0.13%	:white_check_mark:
vicuna-fastchat	1	173.19	178.11	-2.76%	:white_check_mark:
whisper-tiny-encoder	1	410.06	410.98	-0.22%	:white_check_mark:
whisper-tiny-decoder	1	430.68	421.85	2.09%	:white_check_mark:

This build is not recommended to merge :red_circle:

Aug 17 '24 06:08 migraphx-bot

:white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

:white_check_mark: bert-mrpc-tf: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark: torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

:white_check_mark: cadene-dpn92_1: PASSED: MIGraphX meets tolerance

:white_check_mark: cadene-resnext101_1: PASSED: MIGraphX meets tolerance

:white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

:white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance

:white_check_mark: unet: PASSED: MIGraphX meets tolerance

:white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance

:white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: bert_large: PASSED: MIGraphX meets tolerance

:white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance

:white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance

:white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

Aug 17 '24 06:08 migraphx-bot

case when scales and zero point tensors of same dimension as the input tensor:

I dont think the scales and zero points are ever the same shape as the input:

Per-tensor: The scales and zero points are scalars, so we insert a simple broadcast
Per-axis quantization: The scales and zero points are a 1-D tensor that will be broadcasts across the axis(which is defined as an attribute in the ONNX operator, see here). Since its 1-D we need to know the axis(we cant use a multibroadcast here).
Blocked quantization: The scales and zero points have the same rank but different dimensions than the input. The axis of the dimension that is different than the input should be set to the block_size(which both axis and block_size are attributes define in the ONNX operator, see here).

So I dont see how they could ever be the same shape.

Aug 19 '24 17:08 pfultz2

Blocked quantization: The scales and zero points have the same rank but different dimensions than the input. The axis of the dimension that is different than the input should be set to the block_size(which both axis and block_size are attributes define in the ONNX operator, see here).

@pfultz2, I am trying to work with an int4 model while blocked quantization is not implemented. And assume block_size = 1. And thus in simple case here, the scale tensor is the same shape as the input weights.

Aug 19 '24 17:08 lakhinderwalia

Code changes look fine, should have ONNX parse tests to verify what it does.

There are tests being added to another PR, along with int4 type tensors in it. If you like I can add one separately, just let me know, thanks.

(BTW, this case is just a degenerate case of where no broadcast or multicast is required for scales or zero_points.)

Aug 29 '24 22:08 lakhinderwalia

Blocked quantization: The scales and zero points have the same rank but different dimensions than the input. The axis of the dimension that is different than the input should be set to the block_size(which both axis and block_size are attributes define in the ONNX operator, see here).

@pfultz2, I am trying to work with an int4 model while blocked quantization is not implemented. And assume block_size = 1. And thus in simple case here, the scale tensor is the same shape as the input weights.

#3412 already handles block quantization for any block_size.

Sep 04 '24 13:09 pfultz2

#3412 already handles block quantization for any block_size.

'Already'? As in back to the future? :-)

Sep 04 '24 15:09 lakhinderwalia

Code changes look fine, should have ONNX parse tests to verify what it does.

There are tests being added to another PR, along with int4 type tensors in it. If you like I can add one separately, just let me know, thanks.

(BTW, this case is just a degenerate case of where no broadcast or multicast is required for scales or zero_points.)

Would be better to have a test for this degenerate case, but not essentially right now. Approving

Sep 05 '24 16:09 CharlieL7

Would be better to have a test for this degenerate case, but not essentially right now. Approving

Thanks for approving, Charlie. The test graph for the degenerate case has actually already been added in a different PR.

Sep 05 '24 17:09 lakhinderwalia

I dont think this should be merged in. #3412 already handles this case, and if we merge this PR in, its just going to cause merge conflicts for #3412. I dont think this make sense to have all the extra churn for a "degenerate" case which is not used for any models.

Thanks, @pfultz2, yes at this point we don't want the churn. I would have appreciated your early approval. This work was done 3 weeks ago, and now wasted.

Sep 05 '24 17:09 lakhinderwalia

Closing this one. This work has been subsequently done in 3412.

Oct 01 '24 18:10 lakhinderwalia

AMDMIGraphX AMDMIGraphX copied to clipboard

Onnx parsers for Quantization & Dequantization: case when scales and zero point tensors of same dimension as the input tensor

AMDMIGraphX
AMDMIGraphX copied to clipboard