webml-polyfill
webml-polyfill copied to clipboard
[API] Support QUANTIZE and DEQUANTIZE operations
What's the usage of the two operators in int8 model? Can they be supported by pre-processing and post-processing JS code?
Each op has a y_scale and the first layer of the int8 model has a x_scale which is used for quantizing the input of the model since the type of input of the model is f32. I suppose they could be supported by pre-process using JS code and the input type for our model is TENSOR_QUANT8_ASYMM_SIGNED instead of TENSOR_FLOAT32.
I suppose the quantizing op for input could be implemented either in JS or DNNL. If we decide to process quantizing of input in JS code, we could close this and other related issues. @huningxin Would you like to share your opinion?