Description

This PR adds support for Layer Normalization using either Keras or PyTorch with the Vivado backend in io_parallel mode.

This implementation uses a lookup table for inverse square root; the inputs to the lookup table follow a logarithmic distribution for better accuracy.

Tests have been added for both Keras and Pytorch parsing.

Credit is due to @Ethan0Jiang and @LostEcho365 (Zhixing Jiang and Dennis Yin) for their Vivado implementation and Keras parsing support; my contributions were making a change to the inverse square root lookup table implementation, implementing PyTorch support, and adding unit tests. (Here's a link to their pre-print.) The original code authors have given permission for their code to be merged into hls4ml.

Linked issue: https://github.com/fastmachinelearning/hls4ml/issues/1109

Type of change

[x] New feature (non-breaking change which adds functionality)
[x] A new research paper code implementation

Tests

Two unit tests added: test/pytest/test_layernorm.py and test/pytest/test_layernorm_pytorch.py

Checklist

[x] I have read the guidelines for contributing.
[x] I have commented my code, particularly in hard-to-understand areas.
[ ] I have made corresponding changes to the documentation.
[x] My changes generate no new warnings.
[x] I have installed and run pre-commit on the files I edited or added.
[x] I have added tests that prove my fix is effective or that my feature works.

Nov 04 '24 16:11 rianbrooksflynn

The pytest failure is the QKeras tests timing out. There's 299 tests being run in that batch, which I guess is too many. Is there a why to reshuffle the batches to avoid the timeout?

Nov 04 '24 17:11 JanFSchulte

pre-commit.ci autofix

Dec 09 '24 17:12 rianbrooksflynn

For the "broken" diffs, we should look to see what the whitespace error is and fix it before merging.

Dec 18 '24 20:12 jmitrevs

I did a bit of digging and it's not a whitespace problem but rather the file endings are improperly encoded. Likely the person we got the code from was using a Windows machine. @rianbrooksflynn, you can install the dos2unix package and just do dos2unix file.name to fix this.

Dec 18 '24 21:12 JanFSchulte

We should squash the commits when we merge this.

Jan 09 '25 23:01 jmitrevs

Thanks @vloncar for your review and apologies that it took me until now to get around to implementing your feedback! I would appreciate a second look from you at this pull request.

Mar 11 '25 17:03 rianbrooksflynn

pre-commit.ci autofix

Mar 11 '25 17:03 rianbrooksflynn

Fixed the latest merge conflicts. @vloncar, can you check if Rian's fixes are enough to have this merged?

Apr 29 '25 19:04 JanFSchulte

Any update on the merge ?

Jun 05 '25 12:06 The-Padi

First in line on my PR review TODO list, I expect early next week to have time for this.

Jun 05 '25 12:06 vloncar

First in line on my PR review TODO list, I expect early next week to have time for this.

Thank you very much !

Jun 05 '25 12:06 The-Padi

pre-commit.ci autofix

Jul 01 '25 16:07 JanFSchulte

@vloncar I hope this goes in the direction of what you had in mind for the performance validation. I ran synthesis in Vitis 2023.1, 2024.1, and 2025.1 for different input sizes to the LayerNorm and plotted FFs, LUTs, DSPs, BRAM, latency, and II as a function of that input size. 2024.1 and 2025.1 are basically identical, whereas 2023.1 uses a bit less resources but has worse latency.

This is the the default ap_fixed<16.6> and the default target part.