hls4ml icon indicating copy to clipboard operation
hls4ml copied to clipboard

Generalize pooling accumulator inference

Open jmduarte opened this issue 1 year ago • 4 comments

Currently, pooling uses a wider data type for accumulation inferred based on the input type and the number of elements. However, we only implemented this for ap_fixed<W,I> and ap_int<W> and the template matching fails for ap_uint<W> and ap_ufixed<W,I,Q,O> and ap_fixed<W,I,Q,O> (if Q and O are not the default values). This PR (hopefully) makes it so it matches all those additional types.

Type of change

  • [x] Bug fix (non-breaking change that fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)

Tests

Example before this fix: profiling_global_average_pooling1d (1)

Example after this fix: profiling_global_average_pooling1d (2)

Checklist

  • [x] I have read the guidelines for contributing.
  • [ ] I have commented my code, particularly in hard-to-understand areas.
  • [ ] I have made corresponding changes to the documentation.
  • [ ] My changes generate no new warnings.
  • [ ] I have installed and run pre-commit on the files I edited or added.
  • [ ] I have added tests that prove my fix is effective or that my feature works.

@Brainz22 @ddiaz006 @AnthonyAportela @danprim

jmduarte avatar Dec 21 '23 03:12 jmduarte

Do we need a corresponding fix for Quartus?

https://github.com/fastmachinelearning/hls4ml/blob/f41951c57883022362dd4ffbe6ce3ed4efe36811/hls4ml/templates/quartus/firmware/nnet_utils/nnet_pooling.h#L38-L74

jmduarte avatar Dec 21 '23 14:12 jmduarte

pre-commit.ci autofix

jmduarte avatar Dec 29 '23 01:12 jmduarte

Currently, there is an accum_t set for pooling layers which is only effective for io_stream implementation: vivado_backend.py#L376-L380, nnet_pooling_stream.h#L386. (Which could also be problematic at the moment, see #917.)

Maybe we should converge on whether to set pooling accumulator size with accum_t or cpp template matching? @vloncar

calad0i avatar Jan 03 '24 02:01 calad0i

Do we need a corresponding fix for Quartus?

https://github.com/fastmachinelearning/hls4ml/blob/f41951c57883022362dd4ffbe6ce3ed4efe36811/hls4ml/templates/quartus/firmware/nnet_utils/nnet_pooling.h#L38-L74

Actually I think it also needs to be done for vitis (which has its own implementation) in addition to quartus.

jmitrevs avatar Jan 31 '24 23:01 jmitrevs

What did we decide to do about accum_t vs template matching?

jmitrevs avatar Apr 15 '24 15:04 jmitrevs

My general feeling is to prefer the #973 style solution, though implementation details may still need to be addressed.

jmitrevs avatar Apr 15 '24 18:04 jmitrevs

We decided to go with #973 so I will close this PR.

jmitrevs avatar May 16 '24 16:05 jmitrevs