FIFO depth optimizer for Vitis backend
FIFO depth optimizer for the Vitis backend using the built-in cosim FIFO profiling feature of Vitis HLS
- Inserted the new optimizer to the Vitis backend flows
- Implemented the optimizer. The steps are similar to the Vivado optimizer, only the profiling and FIFO depth parsing parts change
- Updated the Vitis Writer to overwrite the
project.tclfile with Vitis attributes - Updated the
build_prj.tclfile to skip the add_vcd_instructions_tcl function in the case of Vitis backend - Added a pytest that checks if the depths decreased after the optimization and the cosim with the optimized depths passes, without detecting any deadlocks
Type of change
For a new feature or function, please create an issue first to discuss it with us before submitting a pull request.
Note: Please delete options that are not relevant.
- [x] Documentation update
- [x] New feature (non-breaking change which adds functionality)
Tests
- Used Vitis HLS 2023.2
- Dummy dense model example. However, optimized depths are the same as the initial ones set by hls4ml
import hls4ml
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
model = Sequential()
model.add(Dense(64, input_shape=(16,), name='fc1', activation='relu'))
model.add(Dense(32, name='fc2', activation='relu'))
model.add(Dense(32, name='fc3', activation='relu'))
model.add(Dense(5, name='fc3', activation='softmax'))
model.add(Dense(5, name='fc4', activation='softmax'))
config = hls4ml.utils.config_from_keras_model(model, default_precision='ap_fixed<8, 4>')
config['Flows'] = ['vitis:fifo_depth_optimization']
hls4ml.model.optimizer.get_optimizer('vitis:fifo_depth_optimization').configure(profiling_fifo_depth=200_000)
output_dir = 'hls4mlprj_fifo_depth_opt_vitis'
hls_model = hls4ml.converters.convert_from_keras_model(model,
io_type='io_stream',
hls_config=config,
output_dir=output_dir,
part='xc7z020clg400-1',
backend='Vitis')
hls_model.build(reset=False, csim=False, synth=True, cosim=True)
hls4ml.report.read_vivado_report(output_dir)
- Tested the model above with Vivado backend to ensure nothing broke
- 10k param U-Net model. After running cosim with the optimized depths, no deadlocks are detected by Vitis
Test Configuration:
Checklist
- [x] I have read the guidelines for contributing.
- [x] I have commented my code, particularly in hard-to-understand areas.
- [ ] I have made corresponding changes to the documentation.
- [x] My changes generate no new warnings.
- [x] I have installed and run
pre-commiton the files I edited or added. - [ ] I have added tests that prove my fix is effective or that my feature works.
hey @vloncar @jmitrevs do you think a unit test is required? To avoid delaying the CI pipeline, I could add a dummy model that verifies synthesis and that the flow is executed. Otherwise, I can add a complete unit test that also checks the FIFO depth results
I think it would be good to have a full test, but until we improve the synthesis testing pipeline, maybe mark it with @pytest.mark.skip(reason='Skipping synthesis tests for now'). This way we could run it locally when we review your code and it won't cause issues in the CI.
To the reviewers: when we agree on how we want to test this, I can open an issue to also include the Vivado/VivadoAccelerator FIFO optimization in the tests, as a future PR, as some changes in the initial implementation of those backends will be needed
After discussing, some more todos:
- [x] Update the hls4ml documentation by including an example of this feature with the Vitis backend
- [x] Include a UNet model on the pytest
- There was a discussion about using
Automated FIFO Sizinginstead ofGlobal FIFO Sizingsource. @jmitrevs
This looks good and ready to be merged. One change I made (which I think is long overdue) is to ensure multiple iterations of the top function are called when user doesn't provide any input. All co-simulation results with io_stream are invalid when only one input is passed, not just the FIFO depth but latency and II. This PR originally suggested to call the top function twice, but in the past I have observed that for some ResNet blocks I needed to run co-sim 4 times (3 times to see the buffers fill up, 4th time to be sure they stopped increasing). I placed 5 to be sure. For large models this may take a lot of time, so we shouldn't go to much larger numbers as a default.