hls4ml
hls4ml copied to clipboard
Wrong prediction of C Simulation compared to QKeras?
First of all, thank you for the amazing work. The library is very useful. While doing some testing I noticed inconsistencies in the predictions between the QKeras model and the generated C++. I have created a very simple example with only a convolutional layer to show the issue. I obtain the C++ prediction through the C Simulation where I specify the input in the file tb_data/tb_input_features.dat.
The problem is that for an input with a certain magnitude, the prediction is not the same as QKeras. My intuition is that it is due to overflow of the data type, however, the fixed precision type I use has at least 5 bits for the integer part which should give a range of around [-32,+31]. If I multiply manually the kernel weights with the input data, the range should be enough to not overflow. Perhaps I'm misunderstanding some concepts.
hls4ml version: '0.8.1'
QKeras model:
inputs = tensorflow.keras.layers.Input(shape=[3,3,1])
outputs = QConv2DBatchnorm(
1,
kernel_size=(3,3),
strides=(1, 1),
padding="valid",
kernel_quantizer="quantized_bits(16,6)",
bias_quantizer="quantized_bits(16,6)",
use_bias=True,
name=f'fused_convbn_0',
momentum=0.9,
epsilon=1e-5
)(inputs)
qmodel = Model(inputs=[inputs], outputs=[outputs], name='qkeras')
qmodel.save("test_layer.h5")
The input data is either all set to 1 or to 2. With one gives the correct results with 2 does not.
Inside tb_data/tb_input_features.dat either:
1 1 1 1 1 1 1 1 1
or
2 2 2 2 2 2 2 2 2
Predictions:
qmodel.predict(1.0 * np.ones((3,3), np.float32).reshape(1,3,3,1)).ravel() # QKeras -> -0.4189453
# csim -> -0.418945 (correct)
qmodel.predict(2.0 * np.ones((3,3), np.float32).reshape(1,3,3,1)).ravel() # QKeras -> -0.8378906
# csim -> 0.162109 (wrong)
Weights:
fused_convbn_0.kernel shape=(3, 3, 1, 1) [-0.46035218 0.35075396 0.91544527 0.40746373 -0.3072288 -0.12625036
-0.36835948 -0.66943824 -0.16099125]
fused_convbn_0.bias shape=(1,) [0.]
fused_convbn_0._iteration shape=() [-1]
fused_convbn_0.batchnorm.gamma shape=(1,) [1.]
fused_convbn_0.batchnorm.beta shape=(1,) [0.]
fused_convbn_0.batchnorm.moving_mean shape=(1,) [0.]
fused_convbn_0.batchnorm.moving_variance shape=(1,) [1.]
I realized that the weights in the C++ code are much larger then the ones of QKeras. In fact by a factor 64. Is this correct?
//Numpy array shape [3, 3, 1, 1]
//Min -0.669433593750
//Max 0.915435791016
//Number of zeros 0
#ifndef W2_H_
#define W2_H_
#ifndef __SYNTHESIS__
weight2_t w2[9];
#else
weight2_t w2[9] = {-29.462890625, 22.447265625, 58.587890625, 26.078125000, -19.662109375, -8.080078125, -23.574218750, -42.843750000, -10.302734375};
#endif
#endif
I found the reason for the scaled weights. In QKeras, if the quantizer's alpha is none, the weights are scaled by a certain factor equal to the data type scale. And hls4ml puts this layer to normalize the scaling factor:
nnet::normalize<layer2_t, result_t, config4>(layer2_out, layer4_out, s4, b4); // fused_convbn_0_alpha
However, the fact that the scaling creates the overflow seems a bug to me.
To avoid this I specifically set alpha=1: quantized_bits(16,6,alpha=1)
kernel_quantizer="quantized_bits(16,6)"
is the problem. By default, QKeras quantizers allows the training of a scaling factor that tends to mess everything up (with hls4ml). You will need to set kernel_quantizer="quantized_bits(16,6,alpha=1.)"
.