[QD8] Size the dynamic quantize parameter correctly
Issue
-
When a subgraph is created with a QD8 tensor, only the size for the actual tensor data is updated in the value. The space reservation or memory planning for the dynamic quant parameters of that tensor is left to the first inference. Then, during memory planning, the number of quant parameters is calculated based on the current shape of the value, which may be smaller than the shape the subgraph was created with.
-
Memory re-planning, triggered based on the tensor size, should correctly handle dynamic quant parameters memory. However, if the subgraph was created with a larger size than the current size, and the current size differs from the previous inference, no memory planning will be triggered.
With that, the bug occurs when the current tensor batch size (based on num_nonbatch_dims) exceeds the planned memory for dynamic quant params without retriggering memory planning. This results in updating the padded quant parameter buffer with the last good quant parameter in xnn_compute_pad_qd8_params, causing a heap-buffer-overflow (with ASAN).
As described earlier, this happens because memory for dynamic quant parameters was reserved based only on the first inference shape, not the potentially much larger subgraph shape.
Fix
This commit applies the same size tracking logic used for data size to dynamic quant parameter size, but only for QD8 values and only for the convert op. It gets the size from the subgraph shape, and updates it during memory planning if necessary.
This approach, as opposed to triggering memory replanning based on dynamic quant parameter size, allows us to avoid memory replanning if the user has created the upper bounded subgraph.
WIP Items
- tests for convert op where we reproduce this
cc @alankelly
Thanks for finding and fixing this.
Let me know when it's ready for review
Alan, I think this is ready for a review.
Thanks Alan for merging the last one. I updated this PR with the change. Please take a look whenever you have time.