hls4ml
hls4ml copied to clipboard
Relu merge optimizer pass
Description
We introduce an hls4ml optimizer pass for merging the ReLU layer into the Dense/Conv2D layers when ReLU immediately follows them---a frequently encountered pattern in neural networks (NNs). NNs in hls4ml are spatially laid out using dataflow stages to implement each layer, which are linked together by FIFOs. These FIFOs can cost BRAMs, LUTs, and/or FFs. By default in hls4ml, each ReLU is implemented as its own dataflow stage. Because each additional dataflow stage costs extra logic and FIFOs, we reduce the resource utilization by merging the ReLU activation function into the layer preceding it. Although the layers with the newly merged ReLU functionality use more logic than before, there is still a net decrease in resources. This optimization was introduced in hls4ml's MLPerf TinyML Benchmark 2022 submission and written up in this paper. Resource reductions introduced by this optimization are reported in the paper.
Type of change
This optimization pass was first mentioned in the MLPerf TinyML PR #503.
- [x] New feature (non-breaking change which adds functionality)
- [x] A new research paper code implementation
Tests
This repo contains two test models (a fully-connected NN and a CNN) that can be trained on MNIST, converted into Vivado HLS, and synthesized using Vivado HLS 2020.1.
Checklist
- [x] I have read the guidelines for contributing.
- [x] I have commented my code, particularly in hard-to-understand areas.
- [x] I have made corresponding changes to the documentation.
- [x] My changes generate no new warnings.
- [x] I have added tests that prove my fix is effective or that my feature works.
Hi @oliviaweng thanks a lot for the contribution, it looks great!
It seems some tests are failing: https://gitlab.cern.ch/fastmachinelearning/hls4ml/-/pipelines/4158655
I think most of the failures are relatively easy to fix, e.g. CONFIG_T::out_t
is missing. Could you take a look and see if you can make them pass? We can also discuss if something is not clear with the error
A question on the approach itself: This will only work for ReLU and will require duplicate overrides for every possible combination of activation, layer and HLS implementation if we want to expand it. Instead of extending the kernel of the matrix-vector multiplication to tack on ReLU computation and then create duplicate function calls for that, why don't we introduce a new operation that is nop by default that sits at the end of the dense function? Basically the where the cast
function sits now. Then the config can include a proper implementation of activation if we choose to merge it in. Like in config class we add template<class data_T, class CONFIG_T> using activation = nnet::some_activation_or_nop<data_T, CONFIG_T>;
This probably sounds unclear, so if you have 15 minutes we can chat over zoom.
Should we discuss @vloncar's suggestion? It seems like there is quite a bit of interest in this PR so it will be good to get it in.
It was discussed offline and we converged on the proper approach to this.
Any more news on this?
@abijithYayavaram is currently building a more generic version. We aim to push some updates within the next several weeks.
What is the plan for this in general? Is this where we'll attack with the new code generation framework?
I believe we should revisit this once we have a better way of generating functions in which activations would be merged. There's little gain in general if FIFO depth optimization is applied (and none for io_parallel).