hls4ml Reduce BRAM usage? Network input issue

Hello. Im working with a simple network for a signal classifier. I adapted some tcl files to be able to target a ZyboZ7-10 that is smaller than a Pynq-Z2. Since it has smaller resource i had to adapt my original network an reduce the size of the input. This are my utilization estimates:

== Utilization Estimates
================================================================
* Summary: 
+-----------------+---------+-------+-------+-------+-----+
|       Name      | BRAM_18K| DSP48E|   FF  |  LUT  | URAM|
+-----------------+---------+-------+-------+-------+-----+
|DSP              |        -|      -|      -|      -|    -|
|Expression       |        -|      -|      0|     24|    -|
|FIFO             |      128|      -|   5175|   6952|    -|
|Instance         |        8|      2|  21259|  38052|    -|
|Memory           |        -|      -|      -|      -|    -|
|Multiplexer      |        -|      -|      -|     45|    -|
|Register         |        -|      -|      5|      -|    -|
+-----------------+---------+-------+-------+-------+-----+
|Total            |      136|      2|  26439|  45073|    0|
+-----------------+---------+-------+-------+-------+-----+
|Available        |      120|     80|  35200|  17600|    0|
+-----------------+---------+-------+-------+-------+-----+
|Utilization (%)  |      113|      2|     75|    256|    0|
+-----------------+---------+-------+-------+-------+-----+

As you can see, the BRAM exceeds the max of the board so i cant sintetize the final design. Is there a way to reduce the bram usage? The input of the networks has 128 data points. Train Data: X=(2304, 128) Y=(2304, 5) Test Data: X=(256, 128) Y=(256, 5)

network (Predict: 80%)

    def create_model(int_bits=2, n_bits=12):
        # Network:  128 (input), 40(dense), 5(dense output)
        k_inic = random.choice(["lecun_uniform"]) #'glorot_uniform'
        model = Sequential()

        model.add(QDense(40, kernel_quantizer=quantized_bits(n_bits,int_bits,alpha=1), 
                        bias_quantizer=quantized_bits(n_bits,int_bits,alpha=1),
                        kernel_initializer=k_inic, 
                        kernel_regularizer=l1(0.0001),
                        name="input_dense"))
        model.add(QActivation(activation=quantized_relu(n_bits), name='Relu3'))

        model.add(QDense(5, kernel_quantizer=quantized_bits(n_bits,int_bits,alpha=1), 
                        bias_quantizer=quantized_bits(n_bits,int_bits,alpha=1),
                        kernel_initializer=k_inic, 
                        kernel_regularizer=l1(0.0001),
                        name="out_dense"))
        model.add(QActivation(activation=quantized_relu(n_bits), name='softmax'))
        opt = keras.optimizers.Adam()

hls4ml config (Predict 79.3%)

    hls4ml.model.optimizer.OutputRoundingSaturationMode.layers = ['Activation']
    hls4ml.model.optimizer.OutputRoundingSaturationMode.rounding_mode = 'AP_RND'
    hls4ml.model.optimizer.OutputRoundingSaturationMode.saturation_mode = 'AP_SAT'

    config = hls4ml.utils.config_from_keras_model(model, granularity='name')

    precision = "ap_fixed<16,6>"

    config['Model'] = {}
    config['Model']['ReuseFactor'] = 4000
    config['Model']['Strategy'] = 'Resource'
    config['Model']['Precision'] = precision

    for Layer in model.layers:
        if isinstance(Layer, tf.keras.layers.Flatten):
            config['LayerName'][Layer.name] = {}
        config['LayerName'][Layer.name]['Precision'] = precision
        config["LayerName"][Layer.name]["ReuseFactor"] = 4000
        config["LayerName"][Layer.name]["Strategy"] = 'Resource'
    ...
    ...
    hls_model = convert_from_keras_model(model=model, backend='VivadoAccelerator', io_type='io_stream', board='zybo-z7010', part='xc7z010clg400-1',
                                        hls_config=config, output_dir="{}".format(OUTPUT_FOLDER), input_data_tb=input_data_tb, output_data_tb=output_data_tb)

    hls_model.compile()

Reading the log, the framework do: INFO: [RTMG 210-285] Implementing FIFO 'in_local_V_data_V_127_U(fifo_w16_d128_A)' using Block RAMs

So each input uses a BRAM. Is there a way to change it? Or to do the design with hls4ml and then modify something in vivado? hls4ml params doesnt affect the bram usage and uses 1 bram for each input I dont have time contraints so latency is not an issue.

I can adapt my data to split the 128 in two so i can have an 64input but is not ideal (64-40-5). The original network on PC had 1024 or 512 data points and to reduce it so much is not ideal. With 64 input the process works (great) and i can run it on pynq framework but i want to try to improove the input.

I know that a small board isnt ideal but i cant use a pynq-z2 for this project or other bigger board. For the zybo ive seen a Resnet20 implemented (manually on vhdl) so i know that it has resources but hls4ml is resource consuming (dont know if because HLS or for the hls4ml tool).

Besides that, is an excelent tool that im basing my final thesis on.

Oct 26 '21 23:10 themachho

Hi, thanks for the query. Something that you can try quickly: io_type='io_parallel' instead of io_type='io_stream'. That may help to remove those FIFOs, and since the NN doesn't contain any Conv layers it should still synthesize okay.

Otherwise, @nicologhielmetti is working on an optimization specifically targeting the resource usage of these FIFOs that might be relevant here (when io_type='io_stream'). Nicolò do you have a branch that you could point @themachho to in order to try it out?

I adapted some tcl files to be able to target a ZyboZ7-10

This is cool, perhaps when you have the project working you'd consider making a PR with the board support for the board!

Oct 28 '21 11:10 thesps

Sure, @themachho here the branch for FIFO optimization --> https://github.com/nicologhielmetti/hls4ml/tree/fifo_depth_opt I would suggest you to use to this commit --> 888713b5d4dce0c4be0791794eca236892021819

UPDATE I cleaned the repo from the unrelated commit. You can now refer to the repo, avoiding to specify the commit

Oct 29 '21 07:10 nghielme

I found that if you set you Backend: VivadoAccelerator the BRAM usage will be a lot lower. For me it cut the usage in half. Why is that?

These are the results with Vivado as Backend: SmallDense16_10Vivado

These with VivadoAccelerator as Backend: SmallDense16_10VivadoAccelerator

This is my Model trained on mnist small_dense

Nov 09 '21 13:11 HenningCode

Sure, @themachho here the branch for FIFO optimization --> https://github.com/nicologhielmetti/hls4ml/tree/fifo_depth_opt I would suggest you to use to this commit --> 888713b

UPDATE I cleaned the repo from the unrelated commit. You can now refer to the repo, avoiding to specify the commit

Hello. I had the same problem; the BRAM usage exceeded the maximum of my board. I was using hls4ml 0.6.0. I tried the branch that @nicologhielmetti pointed to, but now I am getting an error running the build method. The error message says that the directory of the project does not exist. Do you know what I am doing wrong? Could you please help me? Thanks.

This is the message error:

****** Vivado(TM) HLS - High-Level Synthesis from C, C++ and SystemC v2020.1 (64-bit) **** SW Build 2902540 on Wed May 27 19:54:35 MDT 2020 **** IP Build 2902112 on Wed May 27 22:43:36 MDT 2020 ** Copyright 1986-2020 Xilinx, Inc. All Rights Reserved.

source /tools/Xilinx/Vivado/2020.1/scripts/vivado_hls/hls.tcl -notrace INFO: [HLS 200-10] Running '/tools/Xilinx/Vivado/2020.1/bin/unwrapped/lnx64.o/vivado_hls' INFO: [HLS 200-10] For user 'vivado' on host '764847c5e8b6' (Linux_x86_64 version 4.19.0-18-cloud-amd64) on Thu Jan 13 14:50:25 UTC 2022 INFO: [HLS 200-10] On os Debian GNU/Linux 10 (buster) INFO: [HLS 200-10] In directory '/home/vivado/hls4ml/example-models/cnn_pruned' Sourcing Tcl script 'build_prj.tcl' ./project.tcl can't be opened. couldn't read file "./project.tcl": no such file or directory while executing "source [file join $tcldir project.tcl]" (file "build_prj.tcl" line 15) invoked from within "source build_prj.tcl" ("uplevel" body line 1) invoked from within "uplevel #0 [list source $arg] "

INFO: [Common 17-206] Exiting vivado_hls at Thu Jan 13 14:50:25 2022...

Jan 13 '22 16:01 ghost

hls4ml hls4ml copied to clipboard

Reduce BRAM usage? Network input issue

hls4ml
hls4ml copied to clipboard