Add LayerNorm support for Vivado
Description
This PR adds support for Layer Normalization using either Keras or PyTorch with the Vivado backend in io_parallel mode.
This implementation uses a lookup table for inverse square root; the inputs to the lookup table follow a logarithmic distribution for better accuracy.
Tests have been added for both Keras and Pytorch parsing.
Credit is due to @Ethan0Jiang and @LostEcho365 (Zhixing Jiang and Dennis Yin) for their Vivado implementation and Keras parsing support; my contributions were making a change to the inverse square root lookup table implementation, implementing PyTorch support, and adding unit tests. (Here's a link to their pre-print.) The original code authors have given permission for their code to be merged into hls4ml.
Linked issue: https://github.com/fastmachinelearning/hls4ml/issues/1109
Type of change
- [x] New feature (non-breaking change which adds functionality)
- [x] A new research paper code implementation
Tests
Two unit tests added: test/pytest/test_layernorm.py and test/pytest/test_layernorm_pytorch.py
Checklist
- [x] I have read the guidelines for contributing.
- [x] I have commented my code, particularly in hard-to-understand areas.
- [ ] I have made corresponding changes to the documentation.
- [x] My changes generate no new warnings.
- [x] I have installed and run
pre-commiton the files I edited or added. - [x] I have added tests that prove my fix is effective or that my feature works.
The pytest failure is the QKeras tests timing out. There's 299 tests being run in that batch, which I guess is too many. Is there a why to reshuffle the batches to avoid the timeout?
pre-commit.ci autofix
For the "broken" diffs, we should look to see what the whitespace error is and fix it before merging.
I did a bit of digging and it's not a whitespace problem but rather the file endings are improperly encoded. Likely the person we got the code from was using a Windows machine. @rianbrooksflynn, you can install the dos2unix package and just do dos2unix file.name to fix this.
We should squash the commits when we merge this.
Thanks @vloncar for your review and apologies that it took me until now to get around to implementing your feedback! I would appreciate a second look from you at this pull request.
pre-commit.ci autofix
Fixed the latest merge conflicts. @vloncar, can you check if Rian's fixes are enough to have this merged?
Any update on the merge ?
First in line on my PR review TODO list, I expect early next week to have time for this.
First in line on my PR review TODO list, I expect early next week to have time for this.
Thank you very much !
pre-commit.ci autofix
@vloncar I hope this goes in the direction of what you had in mind for the performance validation. I ran synthesis in Vitis 2023.1, 2024.1, and 2025.1 for different input sizes to the LayerNorm and plotted FFs, LUTs, DSPs, BRAM, latency, and II as a function of that input size. 2024.1 and 2025.1 are basically identical, whereas 2023.1 uses a bit less resources but has worse latency.
This is the the default ap_fixed<16.6> and the default target part.
Thanks, looks good. Do all reports say the timing is met? (No scheduling warnings etc, the clock uncertainty is met etc)
I did not observe any warning about the scheduling. The timing results look all very similar: