hls4ml icon indicating copy to clipboard operation
hls4ml copied to clipboard

FPGA Output is Zero in CNN model with 8,512 parameters.

Open zsrabbani opened this issue 1 year ago • 8 comments

I have a CNN model. I used the hls4ml and all file and bitfile generated completely. Now I used the deployment code to implement on FPGA(ZCU104), the prediction output of FPGA is always Zero.

Total params: 8512 (33.25 KB) Trainable params: 8344 (32.59 KB) Non-trainable params: 168 (672.00 Byte)

I will appreciate for helping me.

Here is the Model:

rf_in = Input(shape=(1024, 2), name = 'rf_input')

x = Conv1D(16, 7, activation=None, padding='same', use_bias=False)(rf_in) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling1D(2, strides = 2, padding='same') (x)

x = Conv1D(16, 7, activation=None, padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling1D(2, strides = 2, padding='same') (x)

x = Conv1D(16, 5, activation=None, padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling1D(2, strides=2, padding='same') (x)

x = Conv1D(16, 3, activation=None, padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling1D(2, strides=2, padding='same') (x)

x = Conv1D(8, 5, activation=None, padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling1D(2, strides=2, padding='same') (x)

x = Conv1D(8, 3, activation=None, padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling1D(2, strides=2, padding='same') (x)

x = Conv1D(4, 3, activation=None, padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling1D(2, strides=2, padding='same') (x)

x = Flatten()(x)

dense_1 = Dense(64, activation='relu', use_bias=False)(x) dropout_1 = Dropout(0.35)(dense_1) dense_2 = Dense(16, activation='relu', use_bias=False)(dropout_1) dropout_2 = Dropout(0.55)(dense_2) softmax = Dense(7, activation='softmax', use_bias=False)(dropout_2)

model = keras.Model(rf_in, softmax) opt = keras.optimizers.Adam(learning_rate=0.001) model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=["accuracy"])

model.summary()

Here is the HLSML code:

image

Here s the deployment code:

image

zsrabbani avatar Aug 08 '24 10:08 zsrabbani

I can confirm that I am also encountering similar behaviour where I am using the standard CNN from the hls4ml tutorial quantized at 6-bits and the prediction output I am getting is also random. I would like to add that this only occurs when I am using the Resource Strategy where I am observing considerable accuracy loss (from 84% to 18%) just by switching from Latency to Resource mode.

GeorgeMentzos avatar Aug 08 '24 11:08 GeorgeMentzos

Try to get a better understanding of how configuration of types works from the documentation, what are the effects of using fixed precision and quantization, and ultimately profile your application. See the tutorial, especially part 2 and 4.

vloncar avatar Aug 08 '24 14:08 vloncar

As you can see, I used the correct setup but did not get any results. Could you help me with it?

zsrabbani avatar Aug 09 '24 08:08 zsrabbani

Hi, I've encountered same issue. I am using the example for extension api, kreverse. https://fastmachinelearning.org/hls4ml/advanced/extension.html# I used vivadoaccelerator and got the final hardware block. but when i deploy the hardware on pynq-z2 board, I got only zero-filled output.

returnwellbeing avatar Sep 03 '24 12:09 returnwellbeing

Hi, I would suggest to first of all check if hls_model.predict(x) (HLS model simulated on CPU) corresponds to model.predict(x)(Keras model); they should be at least close each other. If they are not, the problem can be related to accumulators datatype in the network. For that you can try using auto so that the size of the accumulator is inferred by the operations that use the accumulator. This jmitrevs:keras-config-auto can be helpful in using auto for properly handle accumulators datatype.

nghielme avatar Sep 03 '24 13:09 nghielme

@nghielme Thanks for suggestion. I found that hls_model.predict(x) and model.predict(x) are different. your advice was a great help in finding the cause.

@zsrabbani In my case, there are some error when generating {OUTPUT_DIR}/firmware/myproject.cpp. There must are some function calls at the end of the myproject.cpp. please check yours.

void myproject(
    // Here are some inputs
) {

    // hls-fpga-machine-learning insert IO
    // Here are some pragmas

#ifndef __SYNTHESIS__
    static bool loaded_weights = false;
    if (!loaded_weights) {
        // hls-fpga-machine-learning insert load weights
        loaded_weights = true;
    }
#endif

    // ****************************************
    // NETWORK INSTANTIATION
    // ****************************************

    // hls-fpga-machine-learning insert layers
    // Some function calls should be here. if not, the outputs of hardware block are always ZERO
}

returnwellbeing avatar Sep 06 '24 05:09 returnwellbeing

@returnwellbeing As I check myproject.cpp, everything looks fine and I didn't get that comment.

zsrabbani avatar Sep 06 '24 07:09 zsrabbani

I can confirm that I am also encountering similar behaviour where I am using the standard CNN from the hls4ml tutorial quantized at 6-bits and the prediction output I am getting is also random. I would like to add that this only occurs when I am using the Resource Strategy where I am observing considerable accuracy loss (from 84% to 18%) just by switching from Latency to Resource mode.

Hi, I am facing the same problem. When I change the strategy from Latency to Resource, the prediction accuracy drops a lot, and profiling shows that converted layers results are far from the keras model. I am using hls4ml 1.1.0.

Irisaka avatar Jul 14 '25 03:07 Irisaka