XNNPACK Support int8 transposed convolutions with per-channel weight quantization

Support int8 transposed convolutions with per-channel weight quantization

Open lgeiger opened this issue 2 years ago • 3 comments

TFLite uses int8 per-channel weight quantization for transposed convolutions. While XNNPACK includes a fast transposed convolution operation it only supports per-tensor weight quantization (i.e. a single quantisation scale for the weight tensor), which means transposed convolutions in a TFLite QAT int8 model are currently not supported by XNNPACK and won't be accelerated.

It would be excellent if XNNPACK would add support for per-channel quantized weights to the transposed convolution op in the future to match the behaviour of the normal convolution.

Mar 18 '22 14:03 lgeiger

Does TFLite support TRANSPOSE_CONV with per-channel quantization? Last time I looked at it, it wasn't supported there, thus I didn't implement it in XNNPACK.

Mar 18 '22 15:03 Maratyszcza

Does TFLite support TRANSPOSE_CONV with per-channel quantization?

Yes it does. TRANSPOSE_CONV kernels supports both but internally it will actually convert it to per-channel in both cases using this code path.

The MLIR converter will also always output per-channel quantized weights as far as I can tell.

Mar 18 '22 15:03 lgeiger

Does TFLite support TRANSPOSE_CONV with per-channel quantization?

@Maratyszcza I double check this again, and indeed TFLite supports per-channel quantization and the current converter will always generate per-channel quantized transposed convolutions when using QAT.

May 11 '22 18:05 lgeiger

XNNPACK XNNPACK copied to clipboard

Support int8 transposed convolutions with per-channel weight quantization

XNNPACK
XNNPACK copied to clipboard