XNNPACK
XNNPACK copied to clipboard
Support int8 transposed convolutions with per-channel weight quantization
TFLite uses int8 per-channel weight quantization for transposed convolutions. While XNNPACK includes a fast transposed convolution operation it only supports per-tensor weight quantization (i.e. a single quantisation scale for the weight tensor), which means transposed convolutions in a TFLite QAT int8 model are currently not supported by XNNPACK and won't be accelerated.
It would be excellent if XNNPACK would add support for per-channel quantized weights to the transposed convolution op in the future to match the behaviour of the normal convolution.
Does TFLite support TRANSPOSE_CONV
with per-channel quantization? Last time I looked at it, it wasn't supported there, thus I didn't implement it in XNNPACK.
Does TFLite support
TRANSPOSE_CONV
with per-channel quantization?
Yes it does. TRANSPOSE_CONV
kernels supports both but internally it will actually convert it to per-channel in both cases using this code path.
The MLIR converter will also always output per-channel quantized weights as far as I can tell.
Does TFLite support TRANSPOSE_CONV with per-channel quantization?
@Maratyszcza I double check this again, and indeed TFLite supports per-channel quantization and the current converter will always generate per-channel quantized transposed convolutions when using QAT.