Implementation of a new softmax version with PLANE_WISE mode support
This PR introduces a new implementation of the softmax layer, offering increased flexibility and better support for Large Language Models (LLMs) and other applications requiring 2D tensor processing.
Main changes:
- Addition of a
modeparameter to thesoftmax()andsoftmax_gradient()utility functions. - Implementation of PLANE_WISE mode in addition to the existing CHANNEL_WISE mode.
- Update of
softmaxandsoftmaxmaliases to use the new class.
Change details:
- Addition of a
softmax_modeenumeration with CHANNEL_WISE and PLANE_WISE options. - Modification of the
softmax_class to account for the operating mode. - Update of
softmax()andsoftmax_gradient()functions to process data differently based on the chosen mode. - Adaptation of comments and documentation to reflect the new behaviors.
- Update of unit tests to cover both operating modes.
Compatibility:
- This update is backward compatible with existing code using the old
softmax. - Users can easily switch to the new
softmaxorsoftmaxmto benefit from the improvements.
Tests:
- New unit tests added to verify correct behavior of both modes.
- Regression tests performed to ensure existing functionalities are not affected.
Ha yeah, same here as the other PR. This ready for review? It's conflicted with master. Maybe they are all ready for review but just need to be merged with master?
Ha yeah, same here as the other PR. This ready for review? It's conflicted with master. Maybe they are all ready for review but just need to be merged with master?
Absolutely. Thank you Davis.
Sorry I took so long to come back to this. Been a busy few weeks :|
Anyway, I just pulled this and tried to run the unit tests but got compile errors. I.e. I did make -j6 dtest && ./dtest --test_dnn to run the dnn tests. I'm doing it on a machine with cuda so it's building the cuda parts but those have some errors. Be sure to test all these on such a machine :D
Sorry I took so long to come back to this. Been a busy few weeks :|
Anyway, I just pulled this and tried to run the unit tests but got compile errors. I.e. I did
make -j6 dtest && ./dtest --test_dnnto run the dnn tests. I'm doing it on a machine with cuda so it's building the cuda parts but those have some errors. Be sure to test all these on such a machine :D
Since you merged one of the new layers introduced to implement the attention mechanism in Dlib, I've noticed new branch conflicts appearing. I imagine the compilation issue comes from that, as everything works fine on my end. I'm looking into it again to make the necessary adjustments and I'll let you know.
Make sure you build the unit tests with cuda. I just pulled this branch and tried to build them and got build errors.
I'm cancelling this PR because I've merged all the changes with the other PR containing the new definitions for multm_prev.