hls4ml icon indicating copy to clipboard operation
hls4ml copied to clipboard

Number of filters limitation

Open wilfredkisku opened this issue 2 years ago • 22 comments

Is there a limitation to the number of filters in a CNN, as a layer with 32 filters tends to be the bottleneck during synthesis. The synthesis is unable to complete and gets stuck at the conv2d layer having 32 filter.

wilfredkisku avatar Jun 16 '22 20:06 wilfredkisku

Depends on your config. Assuming you use io_stream, the limit will be related to the strategy used, since that affects the algorithm used for the CNN kernel. If you use latency strategy (the default), then filt_height x filt_width x n_channels x n_filters < 4096. If you use io_parallel, well, you shouldn't be using it with large models.

vloncar avatar Jun 17 '22 12:06 vloncar

I have used the io_stream and the strategy used is Resource. Another issue that I am facing is that when I am using the config file to generate the hls4ml model from the quantized models, it results in an accuracy drop from ~75% to ~10%. I check that the baseline quantized model does predict with an accuracy of around ~71% but the hls4ml model gives drops the accuracy after synthesising the model.

wilfredkisku avatar Jun 18 '22 19:06 wilfredkisku

Hi @wilfredkisku, can you share your model. You may look into tracing / profiling functionality.

You can make 1D plots of the expected output vs hls4ml output like so: https://github.com/hls4ml-finn-mlperftiny/CIFAR10/blob/main/hls4ml/convert.py#L222-L233

profiling_18_q_dense

This can help you pinpoint which layers are causing a mismatch. Then you can increase the precision of those layers. Usually it's required to either increase the precision of the outputs or the accumulators (or both).

jmduarte avatar Jun 19 '22 14:06 jmduarte

Thanks @jmduarte for the help. I am including the model to give a better picture of the model that I am trying to synthesize using hls4ml. I have been testing with IFM bit precision of 4 and weight precision of 12, also, 16 and 16 but still the accuracy for the keras model and hls4ml model differs alot.

from qkeras import QActivation
from qkeras import QDense, QConv2DBatchnorm

IFM = 16
WGT = 16

def QResNet9(input_shape = (32,32,3), classes = 10):
    img_input = Input(shape=input_shape)
    x = QActivation('quantized_relu('+str(IFM)+',0)',name='relu_in')(img_input)

    x = QConv2DBatchnorm(16, kernel_size = (3,3), strides=(1,1),
                         kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
                         kernel_initializer='lecun_uniform',  
                         kernel_regularizer=l1(0.0001), name='conv1', use_bias = True)(x)
    x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv1')(x)

    x = QConv2DBatchnorm(16, kernel_size = (3,3), strides=(1,1),
                         kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
                         kernel_initializer='lecun_uniform',  
                         kernel_regularizer=l1(0.0001), name='conv2', use_bias = True)(x)
    x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv2')(x)
    x = MaxPooling2D(pool_size=(2, 2), name='pool1')(x)

    x_skip = x
    x_skip = QConv2DBatchnorm(16, kernel_size = (3,3), strides=(1,1),
                         kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
                         kernel_initializer='lecun_uniform',  
                         kernel_regularizer=l1(0.0001), name='conv3', use_bias = True)(x_skip)
    x_skip = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv3_skip')(x_skip)
    
    x = QConv2DBatchnorm(16, kernel_size = (3,3), strides=(1,1),
                         kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
                         kernel_initializer='lecun_uniform',  
                         kernel_regularizer=l1(0.0001), name='conv4', use_bias = True)(x)
    x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv4')(x)

    #x = QConv2DBatchnorm(16, kernel_size = (3,3), strides=(1,1),
    #                     kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
    #                     kernel_initializer='lecun_uniform',  
    #                     kernel_regularizer=l1(0.0001), name='conv4', use_bias = True)(x)
    #x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv4')(x)

    x = Add()([x, x_skip])

    x = QConv2DBatchnorm(24, kernel_size = (3,3), strides=(1,1),
                         kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
                         kernel_initializer='lecun_uniform',  
                         kernel_regularizer=l1(0.0001), name='conv5', use_bias = True)(x)
    x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv5')(x)
    x = MaxPooling2D(pool_size=(2, 2), name='pool2')(x)

    #x = QConv2DBatchnorm(32, kernel_size = (3,3), strides=(1,1),
    #                     kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
    #                     kernel_initializer='lecun_uniform',  
    #                     kernel_regularizer=l1(0.0001), name='conv6', use_bias = True)(x)
    #x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv6')(x)
    #x = MaxPooling2D(pool_size=(2, 2), name='pool3')(x)

    x_skip = x
    x_skip = QConv2DBatchnorm(24, kernel_size = (3,3), strides=(1,1),
                         kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
                         kernel_initializer='lecun_uniform',  
                         kernel_regularizer=l1(0.0001), name='conv6', use_bias = True)(x_skip)
    x_skip = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv6_skip')(x_skip)
    
    x = QConv2DBatchnorm(24, kernel_size = (3,3), strides=(1,1),
                         kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
                         kernel_initializer='lecun_uniform',  
                         kernel_regularizer=l1(0.0001), name='conv7', use_bias = True)(x)
    x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv7')(x)
    #x = MaxPooling2D(pool_size=(2, 2), name='pool4')(x)

    #x = QConv2DBatchnorm(32, kernel_size = (3,3), strides=(1,1),
    #                     kernel_quantizer="quantized_bits("+str(WGT)+",0,alpha=1)",
    #                     kernel_initializer='lecun_uniform',  
    #                     kernel_regularizer=l1(0.0001), name='conv8', use_bias = True)(x)
    #x = QActivation('quantized_relu('+str(IFM)+',0)', name='relu_conv8')(x)
  
    x = Add()([x, x_skip])
    x = MaxPooling2D()(x)
    #x = MaxPooling2D(pool_size=(2, 2), name='pool5')(x)

    x = Flatten()(x)
    x = Dense(10,name='output_dense')(x)
    x_out = Activation('softmax',name='output_softmax')(x)

    qmodel = Model(inputs=[img_input], outputs=[x_out], name='qkeras')
    return qmodel

Accuracy Keras:  0.7033333333333334
Accuracy hls4ml: 0.11666666666666667

I am including the configuration for hls4ml that I have used.

# Then the QKeras model
hls4ml.model.optimizer.OutputRoundingSaturationMode.layers = ['Activation']
hls4ml.model.optimizer.OutputRoundingSaturationMode.rounding_mode = 'AP_RND'
hls4ml.model.optimizer.OutputRoundingSaturationMode.saturation_mode = 'AP_SAT'

hls_config_q = hls4ml.utils.config_from_keras_model(qmodel, granularity='name')
hls_config_q['Model']['Strategy'] = 'Resource'
hls_config_q['Model']['ReuseFactor'] = 144
hls_config_q['Model']['Precision'] = 'ap_fixed<16,6>'

hls_config_q['LayerName']['conv1']['Strategy'] = 'Resource'
hls_config_q['LayerName']['conv1']['ReuseFactor'] = 108

hls_config_q['LayerName']['conv2']['Strategy'] = 'Resource'
hls_config_q['LayerName']['conv2']['ReuseFactor'] = 144

hls_config_q['LayerName']['conv3']['Strategy'] = 'Resource'
hls_config_q['LayerName']['conv3']['ReuseFactor'] = 144

hls_config_q['LayerName']['conv4']['Strategy'] = 'Resource'
hls_config_q['LayerName']['conv4']['ReuseFactor'] = 144

hls_config_q['LayerName']['conv5']['Strategy'] = 'Resource'
hls_config_q['LayerName']['conv5']['ReuseFactor'] = 144

hls_config_q['LayerName']['conv7']['Strategy'] = 'Resource'
hls_config_q['LayerName']['conv7']['ReuseFactor'] = 144

hls_config_q['LayerName']['conv6']['Strategy'] = 'Resource'
hls_config_q['LayerName']['conv6']['ReuseFactor'] = 144

hls_config_q['LayerName']['output_dense']['Strategy'] = 'Resource'
hls_config_q['LayerName']['output_dense']['ReuseFactor'] = 160

hls_config_q['LayerName']['output_softmax']['Strategy'] = 'Stable'
plotting.print_dict(hls_config_q)
  
cfg_q = hls4ml.converters.create_config(backend='Vivado')
cfg_q['IOType']     = 'io_stream' # Must set this if using CNNs!
cfg_q['HLSConfig']  = hls_config_q
cfg_q['KerasModel'] = qmodel
cfg_q['OutputDir']  = 'quantized_cnn_model_C/'
cfg_q['XilinxPart'] = 'xczu7ev-ffvc1156-2-e'
#cfg_q['XilinxPart'] = 'xcu250-figd2104-2L-e'
  
hls_model_q = hls4ml.converters.keras_to_hls(cfg_q)
hls_model_q.compile()

Few other details I want to include are:

  1. I am using a Ubuntu installed in a VM that has a RAM of ~45 GB allocated to it.
  2. If I use more than 16 filters for any of the layers the synthesis build gets stuck while trying to during loop unrolling for convolutional layer for more than 16 filters, is there a way I can increase the number of filters and be able to synthesize without this issue?
  3. I am using Xilinx Ultrascale+ MPSoC ZCU104 .

wilfredkisku avatar Jun 21 '22 17:06 wilfredkisku

Hi have you solve your problem? I also met the accracy problem when I tested on Resnet.

liuhao-97 avatar Jul 07 '22 08:07 liuhao-97

Hi have you solve your problem? Maybe you can compare the output of the "output_softmax" layer between the hls_model and the keras model. this is my problem. https://github.com/fastmachinelearning/hls4ml/issues/590

liuhao-97 avatar Jul 07 '22 12:07 liuhao-97

@liuhao-97 thank you for the reply. No i could not get it corrected, it's still has the accuracy drop. Is there anything else to rectify the issue that you have pointed out?

wilfredkisku avatar Jul 07 '22 13:07 wilfredkisku

Hi have you tried with full precision model (ap_fix<32,16>)? I mean don't quantize the model and set hls config to ap_fix<32,16>. Maybe you can check the output of the last sofmax layer between the keras model and hls model.

liuhao-97 avatar Jul 15 '22 18:07 liuhao-97

For me I found there might be some problem with the softmax layer. I print the output of dense layer and it works fine. But for softmax layer, the output is totally different. If you check this link https://github.com/hls4ml-finn-mlperftiny/CIFAR10/blob/main/hls4ml/convert.py you will find it remove the softmax layer. so I assume there might be some problem with softmax layer.

liuhao-97 avatar Jul 15 '22 18:07 liuhao-97

@liuhao-97 I tried to profile the layer but came up with a `graph disconnected error'.

ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 32, 32, 1), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'") at layer "prune_low_magnitude_conv1". The following previous layers were accessed without issue: []

wilfredkisku avatar Jul 16 '22 12:07 wilfredkisku

Hi have you tried with full precision model (ap_fix<32,16>)? I mean don't quantize the model and set hls config to ap_fix<32,16>. Maybe you can check the output of the last sofmax layer between the keras model and hls model.

The full precision layer works file, for me the accuracy drops only for the quantized hls model.

wilfredkisku avatar Jul 16 '22 12:07 wilfredkisku

Can you print your output? Is it consisted of some same number and some zeros like [0.25, 0.25, 0.25, 0, 0, 0]?

liuhao-97 avatar Jul 16 '22 12:07 liuhao-97

I am not able to print the output yet.

wilfredkisku avatar Jul 16 '22 15:07 wilfredkisku

@liuhao-97 I tried to profile the layer but came up with a `graph disconnected error'.

ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 32, 32, 1), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'") at layer "prune_low_magnitude_conv1". The following previous layers were accessed without issue: []

Did you prune the model? I think quantized pruned model can't work fine with hls4ml. Maybe you can try quantized model but don't pruned it.

liuhao-97 avatar Jul 17 '22 14:07 liuhao-97

Besides which hls4ml are you using? hls4ml 0.6.0 or the newest branch?

liuhao-97 avatar Jul 18 '22 17:07 liuhao-97

I am using hls4ml 0.6.0. Has this issue been resolved in the new branch?

wilfredkisku avatar Jul 18 '22 17:07 wilfredkisku

I am using hls4ml 0.6.0. Has this issue been resolved in the new branch?

not sure. You can have a try with new brench. Besides, have you tried with "io_type='io_parallel'"? Maybe it can sove the problem.

liuhao-97 avatar Jul 19 '22 07:07 liuhao-97

I am using hls4ml 0.6.0. Has this issue been resolved in the new branch?

not sure. You can have a try with new brench. Besides, have you tried with "io_type='io_parallel'"? Maybe it can sove the problem.

https://github.com/fastmachinelearning/hls4ml/pull/448 Maybe you can check this.

liuhao-97 avatar Jul 19 '22 13:07 liuhao-97

@liuhao-97 I tried but it still did not work for me. Did you get a workaround to make sure that the accuracy does not drop?

wilfredkisku avatar Jul 19 '22 15:07 wilfredkisku

I think it is because you prune the model. When you prune the model, somehow the original connection of the model layer by layer goes wrong, which can be seen in your error.
Can you try with a non-pruning model again to see if there is still accuracy loss?

@liuhao-97 I tried to profile the layer but came up with a `graph disconnected error'.

ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 32, 32, 1), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'") at layer "prune_low_magnitude_conv1". The following previous layers were accessed without issue: []

liuhao-97 avatar Jul 20 '22 08:07 liuhao-97

@liuhao-97 yes the error has been removed after I removed pruning. Thank you.

wilfredkisku avatar Jul 20 '22 08:07 wilfredkisku

@jmduarte models that have concatenate layer or add layers have a considerable accuracy drop. Might be a bug.

wilfredkisku avatar Jul 21 '22 10:07 wilfredkisku