iree
iree copied to clipboard
Missing support for vectorizing quantized convolution ops
Follow up from https://github.com/google/iree/issues/8411, the quantized convolution ops are not vectorized. This introduces temp buffer allocation because types mismatch. We landed https://github.com/google/iree/pull/8526 to work it around. Ideally, we'd like to vectorize those operations as well.
We have a flow to vectorize convolution ops today, the missing part is -- adding a pattern to convert the quantized version into a normal version like for matmul (or maybe extend the vectorization logic to directly account for the zero points if necessary).
It does not block quantized model exploration. We can prioritize this issue once quantized convolution-base models are in our tracking list. Just filing an issue for tracking it.
Unassigned myself as I don't have time recently to work on this P1 issue
Please, @vmurali coordinate with @rsuderman
Yes this should be working and committed. We can decompose quantized convolutions into regular integer convolutions with additional use of reductions and average pooling layers.