hls4ml Number of filters limitation

Is there a limitation to the number of filters in a CNN, as a layer with 32 filters tends to be the bottleneck during synthesis. The synthesis is unable to complete and gets stuck at the conv2d layer having 32 filter.

Jun 16 '22 20:06 wilfredkisku

Depends on your config. Assuming you use io_stream, the limit will be related to the strategy used, since that affects the algorithm used for the CNN kernel. If you use latency strategy (the default), then filt_height x filt_width x n_channels x n_filters < 4096. If you use io_parallel, well, you shouldn't be using it with large models.

Jun 17 '22 12:06 vloncar

I have used the io_stream and the strategy used is Resource. Another issue that I am facing is that when I am using the config file to generate the hls4ml model from the quantized models, it results in an accuracy drop from ~75% to ~10%. I check that the baseline quantized model does predict with an accuracy of around ~71% but the hls4ml model gives drops the accuracy after synthesising the model.

Jun 18 '22 19:06 wilfredkisku

Hi @wilfredkisku, can you share your model. You may look into tracing / profiling functionality.

You can make 1D plots of the expected output vs hls4ml output like so: https://github.com/hls4ml-finn-mlperftiny/CIFAR10/blob/main/hls4ml/convert.py#L222-L233

profiling_18_q_dense

This can help you pinpoint which layers are causing a mismatch. Then you can increase the precision of those layers. Usually it's required to either increase the precision of the outputs or the accumulators (or both).

Jun 19 '22 14:06 jmduarte

Thanks @jmduarte for the help. I am including the model to give a better picture of the model that I am trying to synthesize using hls4ml. I have been testing with IFM bit precision of 4 and weight precision of 12, also, 16 and 16 but still the accuracy for the keras model and hls4ml model differs alot.

from qkeras import QActivation
from qkeras import QDense, QConv2DBatchnorm

IFM = 16
WGT = 16

def QResNet9(input_shape = (32,32,3), classes = 10):
    img_input = Input(shape=input_shape)
    x = QActivation('quantized_relu('+str(IFM)+',0)',name='relu_in')(img_input)

    x = QConv2DBatchnorm(16, kernel_size = (3,3), strides=(1,1),
                         kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
                         kernel_initializer='lecun_uniform',  
                         kernel_regularizer=l1(0.0001), name='conv1', use_bias = True)(x)
    x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv1')(x)

    x = QConv2DBatchnorm(16, kernel_size = (3,3), strides=(1,1),
                         kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
                         kernel_initializer='lecun_uniform',  
                         kernel_regularizer=l1(0.0001), name='conv2', use_bias = True)(x)
    x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv2')(x)
    x = MaxPooling2D(pool_size=(2, 2), name='pool1')(x)

    x_skip = x
    x_skip = QConv2DBatchnorm(16, kernel_size = (3,3), strides=(1,1),
                         kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
                         kernel_initializer='lecun_uniform',  
                         kernel_regularizer=l1(0.0001), name='conv3', use_bias = True)(x_skip)
    x_skip = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv3_skip')(x_skip)
    
    x = QConv2DBatchnorm(16, kernel_size = (3,3), strides=(1,1),
                         kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
                         kernel_initializer='lecun_uniform',  
                         kernel_regularizer=l1(0.0001), name='conv4', use_bias = True)(x)
    x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv4')(x)

    #x = QConv2DBatchnorm(16, kernel_size = (3,3), strides=(1,1),
    #                     kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
    #                     kernel_initializer='lecun_uniform',  
    #                     kernel_regularizer=l1(0.0001), name='conv4', use_bias = True)(x)
    #x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv4')(x)

    x = Add()([x, x_skip])

    x = QConv2DBatchnorm(24, kernel_size = (3,3), strides=(1,1),
                         kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
                         kernel_initializer='lecun_uniform',  
                         kernel_regularizer=l1(0.0001), name='conv5', use_bias = True)(x)
    x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv5')(x)
    x = MaxPooling2D(pool_size=(2, 2), name='pool2')(x)

    #x = QConv2DBatchnorm(32, kernel_size = (3,3), strides=(1,1),
    #                     kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
    #                     kernel_initializer='lecun_uniform',  
    #                     kernel_regularizer=l1(0.0001), name='conv6', use_bias = True)(x)
    #x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv6')(x)
    #x = MaxPooling2D(pool_size=(2, 2), name='pool3')(x)

    x_skip = x
    x_skip = QConv2DBatchnorm(24, kernel_size = (3,3), strides=(1,1),
                         kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
                         kernel_initializer='lecun_uniform',  
                         kernel_regularizer=l1(0.0001), name='conv6', use_bias = True)(x_skip)
    x_skip = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv6_skip')(x_skip)
    
    x = QConv2DBatchnorm(24, kernel_size = (3,3), strides=(1,1),
                         kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
                         kernel_initializer='lecun_uniform',  
                         kernel_regularizer=l1(0.0001), name='conv7', use_bias = True)(x)
    x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv7')(x)
    #x = MaxPooling2D(pool_size=(2, 2), name='pool4')(x)

    #x = QConv2DBatchnorm(32, kernel_size = (3,3), strides=(1,1),
    #                     kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
    #                     kernel_initializer='lecun_uniform',  
    #                     kernel_regularizer=l1(0.0001), name='conv8', use_bias = True)(x)
    #x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv8')(x)
  
    x = Add()([x, x_skip])
    x = MaxPooling2D()(x)
    #x = MaxPooling2D(pool_size=(2, 2), name='pool5')(x)

    x = Flatten()(x)
    x = Dense(10,name='output_dense')(x)
    x_out = Activation('softmax',name='output_softmax')(x)

    qmodel = Model(inputs=[img_input], outputs=[x_out], name='qkeras')
    return qmodel

Accuracy Keras:  0.7033333333333334
Accuracy hls4ml: 0.11666666666666667

I am including the configuration for hls4ml that I have used.

# Then the QKeras model
hls4ml.model.optimizer.OutputRoundingSaturationMode.layers = ['Activation']
hls4ml.model.optimizer.OutputRoundingSaturationMode.rounding_mode = 'AP_RND'
hls4ml.model.optimizer.OutputRoundingSaturationMode.saturation_mode = 'AP_SAT'

hls_config_q = hls4ml.utils.config_from_keras_model(qmodel, granularity='name')
hls_config_q['Model']['Strategy'] = 'Resource'
hls_config_q['Model']['ReuseFactor'] = 144
hls_config_q['Model']['Precision'] = 'ap_fixed<16,6>'

hls_config_q['LayerName']['conv1']['Strategy'] = 'Resource'
hls_config_q['LayerName']['conv1']['ReuseFactor'] = 108

hls_config_q['LayerName']['conv2']['Strategy'] = 'Resource'
hls_config_q['LayerName']['conv2']['ReuseFactor'] = 144

hls_config_q['LayerName']['conv3']['Strategy'] = 'Resource'
hls_config_q['LayerName']['conv3']['ReuseFactor'] = 144

hls_config_q['LayerName']['conv4']['Strategy'] = 'Resource'
hls_config_q['LayerName']['conv4']['ReuseFactor'] = 144

hls_config_q['LayerName']['conv5']['Strategy'] = 'Resource'
hls_config_q['LayerName']['conv5']['ReuseFactor'] = 144

hls_config_q['LayerName']['conv7']['Strategy'] = 'Resource'
hls_config_q['LayerName']['conv7']['ReuseFactor'] = 144

hls_config_q['LayerName']['conv6']['Strategy'] = 'Resource'
hls_config_q['LayerName']['conv6']['ReuseFactor'] = 144

hls_config_q['LayerName']['output_dense']['Strategy'] = 'Resource'
hls_config_q['LayerName']['output_dense']['ReuseFactor'] = 160

hls_config_q['LayerName']['output_softmax']['Strategy'] = 'Stable'
plotting.print_dict(hls_config_q)
  
cfg_q = hls4ml.converters.create_config(backend='Vivado')
cfg_q['IOType']     = 'io_stream' # Must set this if using CNNs!
cfg_q['HLSConfig']  = hls_config_q
cfg_q['KerasModel'] = qmodel
cfg_q['OutputDir']  = 'quantized_cnn_model_C/'
cfg_q['XilinxPart'] = 'xczu7ev-ffvc1156-2-e'
#cfg_q['XilinxPart'] = 'xcu250-figd2104-2L-e'
  
hls_model_q = hls4ml.converters.keras_to_hls(cfg_q)
hls_model_q.compile()

Few other details I want to include are:

I am using a Ubuntu installed in a VM that has a RAM of ~45 GB allocated to it.
If I use more than 16 filters for any of the layers the synthesis build gets stuck while trying to during loop unrolling for convolutional layer for more than 16 filters, is there a way I can increase the number of filters and be able to synthesize without this issue?
I am using Xilinx Ultrascale+ MPSoC ZCU104 .

Jun 21 '22 17:06 wilfredkisku

Hi have you solve your problem? I also met the accracy problem when I tested on Resnet.

Jul 07 '22 08:07 liuhao-97

Hi have you solve your problem? Maybe you can compare the output of the "output_softmax" layer between the hls_model and the keras model. this is my problem. https://github.com/fastmachinelearning/hls4ml/issues/590

Jul 07 '22 12:07 liuhao-97

@liuhao-97 thank you for the reply. No i could not get it corrected, it's still has the accuracy drop. Is there anything else to rectify the issue that you have pointed out?

Jul 07 '22 13:07 wilfredkisku

Hi have you tried with full precision model (ap_fix<32,16>)? I mean don't quantize the model and set hls config to ap_fix<32,16>. Maybe you can check the output of the last sofmax layer between the keras model and hls model.

Jul 15 '22 18:07 liuhao-97

For me I found there might be some problem with the softmax layer. I print the output of dense layer and it works fine. But for softmax layer, the output is totally different. If you check this link https://github.com/hls4ml-finn-mlperftiny/CIFAR10/blob/main/hls4ml/convert.py you will find it remove the softmax layer. so I assume there might be some problem with softmax layer.

Jul 15 '22 18:07 liuhao-97

@liuhao-97 I tried to profile the layer but came up with a `graph disconnected error'.

ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 32, 32, 1), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'") at layer "prune_low_magnitude_conv1". The following previous layers were accessed without issue: []

Jul 16 '22 12:07 wilfredkisku

Hi have you tried with full precision model (ap_fix<32,16>)? I mean don't quantize the model and set hls config to ap_fix<32,16>. Maybe you can check the output of the last sofmax layer between the keras model and hls model.

The full precision layer works file, for me the accuracy drops only for the quantized hls model.

Jul 16 '22 12:07 wilfredkisku

Can you print your output? Is it consisted of some same number and some zeros like [0.25, 0.25, 0.25, 0, 0, 0]?

Jul 16 '22 12:07 liuhao-97

I am not able to print the output yet.

Jul 16 '22 15:07 wilfredkisku

@liuhao-97 I tried to profile the layer but came up with a `graph disconnected error'.

ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 32, 32, 1), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'") at layer "prune_low_magnitude_conv1". The following previous layers were accessed without issue: []

Did you prune the model? I think quantized pruned model can't work fine with hls4ml. Maybe you can try quantized model but don't pruned it.

Jul 17 '22 14:07 liuhao-97

Besides which hls4ml are you using? hls4ml 0.6.0 or the newest branch?

Jul 18 '22 17:07 liuhao-97

I am using hls4ml 0.6.0. Has this issue been resolved in the new branch?

Jul 18 '22 17:07 wilfredkisku

I am using hls4ml 0.6.0. Has this issue been resolved in the new branch?

not sure. You can have a try with new brench. Besides, have you tried with "io_type='io_parallel'"? Maybe it can sove the problem.

Jul 19 '22 07:07 liuhao-97

I am using hls4ml 0.6.0. Has this issue been resolved in the new branch?

not sure. You can have a try with new brench. Besides, have you tried with "io_type='io_parallel'"? Maybe it can sove the problem.

https://github.com/fastmachinelearning/hls4ml/pull/448 Maybe you can check this.

Jul 19 '22 13:07 liuhao-97

@liuhao-97 I tried but it still did not work for me. Did you get a workaround to make sure that the accuracy does not drop?

Jul 19 '22 15:07 wilfredkisku

I think it is because you prune the model. When you prune the model, somehow the original connection of the model layer by layer goes wrong, which can be seen in your error.
Can you try with a non-pruning model again to see if there is still accuracy loss?

@liuhao-97 I tried to profile the layer but came up with a `graph disconnected error'.

ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 32, 32, 1), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'") at layer "prune_low_magnitude_conv1". The following previous layers were accessed without issue: []

Jul 20 '22 08:07 liuhao-97

@liuhao-97 yes the error has been removed after I removed pruning. Thank you.

Jul 20 '22 08:07 wilfredkisku

@jmduarte models that have concatenate layer or add layers have a considerable accuracy drop. Might be a bug.

Jul 21 '22 10:07 wilfredkisku

hls4ml hls4ml copied to clipboard

Number of filters limitation

hls4ml
hls4ml copied to clipboard