hls4ml
hls4ml copied to clipboard
Generalize pooling accumulator inference
Currently, pooling uses a wider data type for accumulation inferred based on the input type and the number of elements. However, we only implemented this for ap_fixed<W,I>
and ap_int<W>
and the template matching fails for ap_uint<W>
and ap_ufixed<W,I,Q,O>
and ap_fixed<W,I,Q,O>
(if Q
and O
are not the default values). This PR (hopefully) makes it so it matches all those additional types.
Type of change
- [x] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
Tests
Example before this fix:
Example after this fix:
Checklist
- [x] I have read the guidelines for contributing.
- [ ] I have commented my code, particularly in hard-to-understand areas.
- [ ] I have made corresponding changes to the documentation.
- [ ] My changes generate no new warnings.
- [ ] I have installed and run
pre-commit
on the files I edited or added. - [ ] I have added tests that prove my fix is effective or that my feature works.
@Brainz22 @ddiaz006 @AnthonyAportela @danprim
Do we need a corresponding fix for Quartus?
https://github.com/fastmachinelearning/hls4ml/blob/f41951c57883022362dd4ffbe6ce3ed4efe36811/hls4ml/templates/quartus/firmware/nnet_utils/nnet_pooling.h#L38-L74
pre-commit.ci autofix
Currently, there is an accum_t
set for pooling layers which is only effective for io_stream
implementation: vivado_backend.py#L376-L380, nnet_pooling_stream.h#L386. (Which could also be problematic at the moment, see #917.)
Maybe we should converge on whether to set pooling accumulator size with accum_t
or cpp template matching? @vloncar
Do we need a corresponding fix for Quartus?
https://github.com/fastmachinelearning/hls4ml/blob/f41951c57883022362dd4ffbe6ce3ed4efe36811/hls4ml/templates/quartus/firmware/nnet_utils/nnet_pooling.h#L38-L74
Actually I think it also needs to be done for vitis (which has its own implementation) in addition to quartus.
What did we decide to do about accum_t vs template matching?
My general feeling is to prefer the #973 style solution, though implementation details may still need to be addressed.
We decided to go with #973 so I will close this PR.