dlib Implementation of a new softmax version with PLANE

This PR introduces a new implementation of the softmax layer, offering increased flexibility and better support for Large Language Models (LLMs) and other applications requiring 2D tensor processing.

Main changes:

Addition of a mode parameter to the softmax() and softmax_gradient() utility functions.
Implementation of PLANE_WISE mode in addition to the existing CHANNEL_WISE mode.
Update of softmax and softmaxm aliases to use the new class.

Change details:

Addition of a softmax_mode enumeration with CHANNEL_WISE and PLANE_WISE options.
Modification of the softmax_ class to account for the operating mode.
Update of softmax() and softmax_gradient() functions to process data differently based on the chosen mode.
Adaptation of comments and documentation to reflect the new behaviors.
Update of unit tests to cover both operating modes.

Compatibility:

This update is backward compatible with existing code using the old softmax.
Users can easily switch to the new softmax or softmaxm to benefit from the improvements.

Tests:

New unit tests added to verify correct behavior of both modes.
Regression tests performed to ensure existing functionalities are not affected.

Sep 27 '24 14:09 Cydral

Ha yeah, same here as the other PR. This ready for review? It's conflicted with master. Maybe they are all ready for review but just need to be merged with master?

Sep 30 '24 02:09 davisking

Ha yeah, same here as the other PR. This ready for review? It's conflicted with master. Maybe they are all ready for review but just need to be merged with master?

Absolutely. Thank you Davis.

Sep 30 '24 07:09 Cydral

Sorry I took so long to come back to this. Been a busy few weeks :|

Anyway, I just pulled this and tried to run the unit tests but got compile errors. I.e. I did make -j6 dtest && ./dtest --test_dnn to run the dnn tests. I'm doing it on a machine with cuda so it's building the cuda parts but those have some errors. Be sure to test all these on such a machine :D

Oct 17 '24 03:10 davisking

Sorry I took so long to come back to this. Been a busy few weeks :|

Anyway, I just pulled this and tried to run the unit tests but got compile errors. I.e. I did make -j6 dtest && ./dtest --test_dnn to run the dnn tests. I'm doing it on a machine with cuda so it's building the cuda parts but those have some errors. Be sure to test all these on such a machine :D

Since you merged one of the new layers introduced to implement the attention mechanism in Dlib, I've noticed new branch conflicts appearing. I imagine the compilation issue comes from that, as everything works fine on my end. I'm looking into it again to make the necessary adjustments and I'll let you know.

Oct 17 '24 18:10 Cydral

Make sure you build the unit tests with cuda. I just pulled this branch and tried to build them and got build errors.

Nov 16 '24 03:11 davisking

I'm cancelling this PR because I've merged all the changes with the other PR containing the new definitions for multm_prev.

Dec 09 '24 20:12 Cydral

Implementation of a new softmax version with PLANE_WISE mode support

Main changes:

Change details: