hls4ml
hls4ml copied to clipboard
Argmax Softmax
A# Description
:memo: A new implementation of Softmax activation for
Vivado
andQuartus
- Softmax is a monotonically increasing function; therefore, in classification problems, the output is equivalent to the largest output before Softmax activation (i.e. Softmax
- Default implementation is still
stable
, as we are sometimes interested in the normalized output probability.
Two implementations are added:
- ArgMax - Returns a one-hot encoded vector. This would be set through the hls4ml config, so for example hls_config['LayerName']['softmax']['strategy'] = 'argmax' (very similar to what we do now with stable and latency implementations of Softmax).
- Logits - Removes the Softmax layer. Again handled through hls4ml config, through an optional boolean attribute skip (defaults to false), so for example: hls_config['LayerName']['softmax']['skip'] = True. There would be an optimizer that removes the Softmax node from the model graph and rewires the network.
Type of change
- [x] New feature (non-breaking change which adds functionality)
Tests
- Expanded
test/pytest/test_softmax.py
with new implementation
Checklist
- [x] I have read the guidelines for contributing.
- [] I have commented my code, particularly in hard-to-understand areas.
- [] I have made corresponding changes to the documentation.
- [x] My changes generate no new warnings.
- [x] I have added tests that prove my fix is effective or that my feature works.
I like the idea, but I think the implementation could be handled differently.
Am I right thinking that this "strategy argmax" effectively replaces the softmax function with a linear activation? If so, this feels like something that should be done with an optimizer pass rather than in the HLS, i.e. one that looks for a softmax at the end of the model and deletes it. And it's confusing to still call this softmax. From the description I also expected a layer that returns the actual argmax value, I think that would be nice. There you'd need some HLS to do that function, and I'd say also an optimizer pass to insert it. We could even implement a Python 'argmax layer' for Keras, such that users can add it to the model at the end to explicitly say that this is how the model should be converted.
I think also it would not only be softmax that can be replaced in this way, but any activation function that meets the criteria you described (which is most of them).
I agree with Sioni's comments and I would add there could be two different optimizer passes depending on the choice of strategy:
- "argmax" -> replaces softmax with an argmax layer (returning either a 1-hot encoded vector or the max index)
- "logits" -> just removes softmax layer entirely and returns the logits
I agree with the comments, the reason I didn't remove the Softmax layer completely is because in the test we have a simple one-layer network, which if removed, wouldn't really work. @thesps you're right, Softmax is essentially replaced with linear activation in this PR. My proposed change is then two "implementations":
- ArgMax - Returns a one-hot encoded vector. This would be set through the hls4ml config, so for example
hls_config['LayerName']['softmax']['strategy'] = 'argmax'
(very similar to what we do now with stable and latency implementations of Softmax). I would still handle this implementation through HLS, as it acts as a (rough) approximation of Sofmtax. I would avoid only returning the index of ArgMax, as it changes the dimensionality of the output (can be done, but doesn't feel correct) - Logits - Removes the Softmax layer. Again handled through hls4ml config, through an optional boolean attribute
skip
(defaults to false), so for example:hls_config['LayerName']['softmax']['skip'] = True
. There would be an optimizer that removes the Softmax node from the model graph and rewires the network.
I would avoid having a custom Keras layer, for two reasons:
- Users might not be aware of our implementation (we should include it in the documentation & tutorials) and they might already have a pre-trained model using Keras Softmax (this is a very weak point, as replacing a Keras layer with a custom one is a few lines of code)
- hls4ml is not Keras specific, there are converters for PyTorch and ONNX as well
Apologies for the late followup review. This looks really nice now, all these implemented options are useful. The problems now are:
- some tests fail, including ones testing the new feature. Could you look into that?
- there are some merge conflicts, although they look easy enough to resolve
Since Benjamin is back at university now, I resolved the two issues, can you merge now @thesps ?