hls4ml icon indicating copy to clipboard operation
hls4ml copied to clipboard

Wrong prediction of C Simulation compared to QKeras?

Open lcit opened this issue 11 months ago • 3 comments

First of all, thank you for the amazing work. The library is very useful. While doing some testing I noticed inconsistencies in the predictions between the QKeras model and the generated C++. I have created a very simple example with only a convolutional layer to show the issue. I obtain the C++ prediction through the C Simulation where I specify the input in the file tb_data/tb_input_features.dat.

The problem is that for an input with a certain magnitude, the prediction is not the same as QKeras. My intuition is that it is due to overflow of the data type, however, the fixed precision type I use has at least 5 bits for the integer part which should give a range of around [-32,+31]. If I multiply manually the kernel weights with the input data, the range should be enough to not overflow. Perhaps I'm misunderstanding some concepts.

hls4ml version: '0.8.1'

test_layer.zip

QKeras model:

inputs = tensorflow.keras.layers.Input(shape=[3,3,1])
outputs = QConv2DBatchnorm(
    1,
    kernel_size=(3,3),
    strides=(1, 1),
    padding="valid",
    kernel_quantizer="quantized_bits(16,6)",
    bias_quantizer="quantized_bits(16,6)",
    use_bias=True,
    name=f'fused_convbn_0',
    momentum=0.9,
    epsilon=1e-5
)(inputs)
qmodel = Model(inputs=[inputs], outputs=[outputs], name='qkeras')
qmodel.save("test_layer.h5")

The input data is either all set to 1 or to 2. With one gives the correct results with 2 does not. Inside tb_data/tb_input_features.dat either: 1 1 1 1 1 1 1 1 1 or 2 2 2 2 2 2 2 2 2

Predictions:

qmodel.predict(1.0 * np.ones((3,3), np.float32).reshape(1,3,3,1)).ravel() # QKeras -> -0.4189453  
# csim  -> -0.418945 (correct)

qmodel.predict(2.0 * np.ones((3,3), np.float32).reshape(1,3,3,1)).ravel() # QKeras -> -0.8378906  
# csim  -> 0.162109 (wrong)

Weights:

fused_convbn_0.kernel shape=(3, 3, 1, 1) [-0.46035218  0.35075396  0.91544527  0.40746373 -0.3072288  -0.12625036
 -0.36835948 -0.66943824 -0.16099125]
fused_convbn_0.bias shape=(1,) [0.]
fused_convbn_0._iteration shape=() [-1]
fused_convbn_0.batchnorm.gamma shape=(1,) [1.]
fused_convbn_0.batchnorm.beta shape=(1,) [0.]
fused_convbn_0.batchnorm.moving_mean shape=(1,) [0.]
fused_convbn_0.batchnorm.moving_variance shape=(1,) [1.]

lcit avatar Mar 08 '24 13:03 lcit

I realized that the weights in the C++ code are much larger then the ones of QKeras. In fact by a factor 64. Is this correct?

//Numpy array shape [3, 3, 1, 1]
//Min -0.669433593750
//Max 0.915435791016
//Number of zeros 0

#ifndef W2_H_
#define W2_H_

#ifndef __SYNTHESIS__
weight2_t w2[9];
#else
weight2_t w2[9] = {-29.462890625, 22.447265625, 58.587890625, 26.078125000, -19.662109375, -8.080078125, -23.574218750, -42.843750000, -10.302734375};
#endif

#endif

lcit avatar Mar 09 '24 14:03 lcit

I found the reason for the scaled weights. In QKeras, if the quantizer's alpha is none, the weights are scaled by a certain factor equal to the data type scale. And hls4ml puts this layer to normalize the scaling factor:

nnet::normalize<layer2_t, result_t, config4>(layer2_out, layer4_out, s4, b4); // fused_convbn_0_alpha

However, the fact that the scaling creates the overflow seems a bug to me.

To avoid this I specifically set alpha=1: quantized_bits(16,6,alpha=1)

lcit avatar Mar 13 '24 18:03 lcit

kernel_quantizer="quantized_bits(16,6)" is the problem. By default, QKeras quantizers allows the training of a scaling factor that tends to mess everything up (with hls4ml). You will need to set kernel_quantizer="quantized_bits(16,6,alpha=1.)".

calad0i avatar Apr 16 '24 20:04 calad0i