neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

PostTrainingQuantConfig(quant_level='auto', device='npu', backend="onnxrt_dml_ep") produces fp32 ops.

Open kleiti opened this issue 1 year ago • 1 comments

The below PostTrainingQuantConfig produces fp32 ops for NPU using 2.4.1. Models with int8 and fp16 ops would be preferred for NPU.

conf=PostTrainingQuantConfig(quant_level='auto', device='npu', backend="onnxrt_dml_ep", quant_format="QOperator", approach="static", excluded_precisions=['bf16'])

image

kleiti avatar Jan 26 '24 11:01 kleiti

Hi @kleiti , onnxrt_dml_ep backend is experimental and currently we only support MatMul int8. We will enhance its functionality later.

mengniwang95 avatar Feb 19 '24 07:02 mengniwang95