hls4ml FIFO depth optimizer for Vitis backend

FIFO depth optimizer for the Vitis backend using the built-in cosim FIFO profiling feature of Vitis HLS

Inserted the new optimizer to the Vitis backend flows
Implemented the optimizer. The steps are similar to the Vivado optimizer, only the profiling and FIFO depth parsing parts change
Updated the Vitis Writer to overwrite the project.tcl file with Vitis attributes
Updated the build_prj.tcl file to skip the add_vcd_instructions_tcl function in the case of Vitis backend
Added a pytest that checks if the depths decreased after the optimization and the cosim with the optimized depths passes, without detecting any deadlocks

Type of change

For a new feature or function, please create an issue first to discuss it with us before submitting a pull request.

Note: Please delete options that are not relevant.

[x] Documentation update
[x] New feature (non-breaking change which adds functionality)

Tests

Used Vitis HLS 2023.2
Dummy dense model example. However, optimized depths are the same as the initial ones set by hls4ml

import hls4ml
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
model = Sequential()
model.add(Dense(64, input_shape=(16,), name='fc1', activation='relu'))
model.add(Dense(32, name='fc2', activation='relu'))
model.add(Dense(32, name='fc3', activation='relu'))
model.add(Dense(5, name='fc3', activation='softmax'))
model.add(Dense(5, name='fc4', activation='softmax'))

config = hls4ml.utils.config_from_keras_model(model, default_precision='ap_fixed<8, 4>')
config['Flows'] = ['vitis:fifo_depth_optimization']
hls4ml.model.optimizer.get_optimizer('vitis:fifo_depth_optimization').configure(profiling_fifo_depth=200_000)

output_dir = 'hls4mlprj_fifo_depth_opt_vitis'
hls_model = hls4ml.converters.convert_from_keras_model(model,
                                                       io_type='io_stream',
                                                       hls_config=config,
                                                       output_dir=output_dir,
                                                       part='xc7z020clg400-1',
                                                       backend='Vitis')

hls_model.build(reset=False, csim=False, synth=True, cosim=True)
hls4ml.report.read_vivado_report(output_dir)

Tested the model above with Vivado backend to ensure nothing broke
10k param U-Net model. After running cosim with the optimized depths, no deadlocks are detected by Vitis

Test Configuration:

Checklist

[x] I have read the guidelines for contributing.
[x] I have commented my code, particularly in hard-to-understand areas.
[ ] I have made corresponding changes to the documentation.
[x] My changes generate no new warnings.
[x] I have installed and run pre-commit on the files I edited or added.
[ ] I have added tests that prove my fix is effective or that my feature works.

Jul 18 '24 12:07 steltze

hey @vloncar @jmitrevs do you think a unit test is required? To avoid delaying the CI pipeline, I could add a dummy model that verifies synthesis and that the flow is executed. Otherwise, I can add a complete unit test that also checks the FIFO depth results

Jul 25 '24 10:07 steltze

I think it would be good to have a full test, but until we improve the synthesis testing pipeline, maybe mark it with @pytest.mark.skip(reason='Skipping synthesis tests for now'). This way we could run it locally when we review your code and it won't cause issues in the CI.

Jul 25 '24 12:07 vloncar

To the reviewers: when we agree on how we want to test this, I can open an issue to also include the Vivado/VivadoAccelerator FIFO optimization in the tests, as a future PR, as some changes in the initial implementation of those backends will be needed

Aug 01 '24 08:08 steltze

After discussing, some more todos:

[x] Update the hls4ml documentation by including an example of this feature with the Vitis backend
[x] Include a UNet model on the pytest
There was a discussion about using Automated FIFO Sizing instead of Global FIFO Sizing source. @jmitrevs

Sep 23 '24 08:09 steltze

This looks good and ready to be merged. One change I made (which I think is long overdue) is to ensure multiple iterations of the top function are called when user doesn't provide any input. All co-simulation results with io_stream are invalid when only one input is passed, not just the FIFO depth but latency and II. This PR originally suggested to call the top function twice, but in the past I have observed that for some ResNet blocks I needed to run co-sim 4 times (3 times to see the buffers fill up, 4th time to be sure they stopped increasing). I placed 5 to be sure. For large models this may take a lot of time, so we shouldn't go to much larger numbers as a default.

Mar 03 '25 18:03 vloncar