graffitist icon indicating copy to clipboard operation
graffitist copied to clipboard

Is variables of batch norm layers folded while inference?

Open jakc4103 opened this issue 4 years ago • 2 comments

Hi, thanks again for sharing this repo for reproducing the awesome results.

I am curious about is the BatchNorm layers folded into preceding Conv or FC layers while inference? I ran both static and retrain mode for mobilenetv2. While inference, I found that the variables of BatchNorm (mean/var/gamma/beta) are filled with some values instead of 1s or 0s, and still get involved in the computation graph. Is that worked as intended? (I load the quantized model with .ckpt and .pb files.)

jakc4103 avatar May 04 '20 07:05 jakc4103

Hi @jakc4103, The folding of BatchNorm layers in Graffitist is not an in-place operation. By that I mean, the BN parameters (mean, var, gamma, beta etc) are retained as variables, and only the graph is modified to fold them along with weights / biases. As a result, the folded weights and biases are not modified in-place, rather the computed at run-time using ops that implement the folding. If you're interested in getting the final folded & quantized weights / biases of a convolutional layer, you may provide dump_quant_params=True argument to the quantize transform like this.

python $groot/graffitize.pyc \
    --in_graph $in_graph \
    --out_graph $infquant_graph \
    --inputs $input_node \
    --outputs $output_node \
    --input_shape $input_shape \
    --transforms 'fix_input_shape' \
                 'fold_batch_norms' \
                 'remove_training_nodes' \
                 'strip_unused_nodes' \
                 'preprocess_layers' \
                 'quantize(dump_quant_params=True, ...)'

This will save out an HDF5 dump of quantized weights and biases, which will have BN params folded in and quantized appropriately.

sjain-stanford avatar May 04 '20 19:05 sjain-stanford

@sjain-stanford tks for the kind reply!

Just to be sure, I found the dumped weights are in integer format, while the whole training and inference pipelines are done using FakeQuant (computations in FP32), is that right?

Also, is there any other debugging or experimental flags for quantize()? (ex: setting weight calibration method to MAX for training mode)

jakc4103 avatar May 05 '20 05:05 jakc4103