Does aimet and qnn htp backend support weight-only quantization?

Open xiaoxiaosuaxuan opened this issue 5 months ago • 1 comments

Does aimet and qnn htp backend support weight-only quantization? For example, the activation is fp16, and weight is 4bit/8bit quantized.

Or does qnn htp backend support fp16 matmul? If so, it may be feasible to manually dequantize the weight, and then perform fp16 matmul. The qnn operation documentation says that htp backend support fp16 fullyconnected, but I failed to run it on devices.

Jul 25 '25 08:07 xiaoxiaosuaxuan