nncf nncf inference with BN folding?

Will the BN folding cause accuracy loss during inference phase? We want to fold the BN to weights to improve inference speed because the BN layer will cost extra memory and instruction cycles.

Jun 03 '21 03:06 xiaoyaopeng

Hi @xiaoyaopeng,

OpenVINO does it for you automatically for floating-point and quantized models. And there is no accuracy degradation if you use NNCF.

Jun 03 '21 08:06 AlexKoff88

Thanks.

Jun 07 '21 03:06 xiaoyaopeng

One more thing, OpenVINO will fold BN to weights? if we use per tensor quantization, what will OpenVINO do to the weights? Will the scale parameter be folded too? Note that the dimention of BN parameter is the same as channel num of weights, but we only have one scale parameter for one layer.

Jun 08 '21 09:06 xiaoyaopeng

This can be a problem actually because OpenVINO mostly fuses BN into FakeQuantize parameters. To be honest of the HW we have supports per-channel quantization of weights as the most accurate scheme.

Jun 09 '21 06:06 AlexKoff88

In the classification example, compress_ctrl can export onnx model by using function _export_to_onnx() in class PTExporter. Will the quantization parameters be exported to onnx model? Can OpenVINO parse these parameters? OpenVINO has model optimization such as BN folding, but I can't find an introduction which openvino can work with nncf compressed model in openvino docs.

Jun 17 '21 03:06 xiaoyaopeng

NNCF can export quantization parameters either to standard ONNX with QuantizeLinerar and DequantizeLinear operations or to custom ONNX with FakeQuantize op from openvino domain. Both are recognizable by OpenVINO. As for the BN folding, this is the responsibility of the Model Optimizer component within OpenVINO.

Jun 17 '21 15:06 AlexKoff88

nncf nncf copied to clipboard

nncf inference with BN folding?

nncf
nncf copied to clipboard