graffitist
graffitist copied to clipboard
Is variables of batch norm layers folded while inference?
Hi, thanks again for sharing this repo for reproducing the awesome results.
I am curious about is the BatchNorm layers folded into preceding Conv or FC layers while inference? I ran both static and retrain mode for mobilenetv2. While inference, I found that the variables of BatchNorm (mean/var/gamma/beta) are filled with some values instead of 1s or 0s, and still get involved in the computation graph. Is that worked as intended? (I load the quantized model with .ckpt and .pb files.)
Hi @jakc4103,
The folding of BatchNorm layers in Graffitist is not an in-place operation. By that I mean, the BN parameters (mean, var, gamma, beta etc) are retained as variables, and only the graph is modified to fold them along with weights / biases. As a result, the folded weights and biases are not modified in-place, rather the computed at run-time using ops that implement the folding. If you're interested in getting the final folded & quantized weights / biases of a convolutional layer, you may provide dump_quant_params=True
argument to the quantize
transform like this.
python $groot/graffitize.pyc \
--in_graph $in_graph \
--out_graph $infquant_graph \
--inputs $input_node \
--outputs $output_node \
--input_shape $input_shape \
--transforms 'fix_input_shape' \
'fold_batch_norms' \
'remove_training_nodes' \
'strip_unused_nodes' \
'preprocess_layers' \
'quantize(dump_quant_params=True, ...)'
This will save out an HDF5 dump of quantized weights and biases, which will have BN params folded in and quantized appropriately.
@sjain-stanford tks for the kind reply!
Just to be sure, I found the dumped weights are in integer format, while the whole training and inference pipelines are done using FakeQuant (computations in FP32), is that right?
Also, is there any other debugging or experimental flags for quantize()? (ex: setting weight calibration method to MAX for training mode)