activation intrinsics for neural networks
Add activation functions for neural networks. Relates to https://github.com/fortran-lang/stdlib/issues/858
Todo:
- [x] specs & autodoc
- [x] tests
- [ ] examples
Previous Art SciPy Special functions collection: https://docs.scipy.org/doc/scipy/reference/special.html torch nn: https://pytorch.org/docs/stable/nn.html#non-linear-activations-other neural-fortran : https://github.com/modern-fortran/neural-fortran/blob/main/src/nf/nf_activation.f90
cc: @Beliavsky @milancurcic @epagone @jvdp1 @perazz
Before moving forward, any opinions on
- putting these functions within the "specialfunctions" category?
- naming of the derivative/gradient version with
<name>_grad? - any other remark?
Thanks @perazz for your review! regarding tanh and erf, I wonder if I should actually remove reference to the intrinsic names and simply leave a reference to the fast approximation as that is what makes sense for NNs. Also, should these functions stay here or moved to the intrinsics module?
Regarding
I'm fine with the f prefix (or fast_ would also be OK). I believe that should be standardized somewhere, for example in the library style guide?
I agree with you and also wonder which one would be preferable, I don't have strong oppinions on that.
I think that for now, it makes sense to only have these functions as part of the activation functions submodule (it's natural that they are approximated versions in this context). For other functions overloading intrinsics, there is no single rule:
gamma,log_gammadirectly overwrite the intrinsics namestdlib_sum*,stdlib_dot_product*intrinsics have thestdlib_prefix.
So, perhaps adding another rule (f* prefix) is not so desirable? Maybe, they should be named stdlib_tanh and stdlib_erf in line with the other overloaded intrinsics. Just a thought.
@perazz so for the fast functions I propose here to use the naming fast_<> as I felt that a stdlib_<> feels like stdlib's reference implementation, but here we are proposing degraded accuracy for the sake of activation functions. I've added the documentation for those and also extended the tests for different accuracies but leaving the tolerance fixed to a low accuracy value.
A rigorous test would be to print the activation function and derivative values for a range of arguments, write them to a file, and compare them with a reference implementation in Python. How feasible is that?
I have looked at the code and checked a few of the activation functions and derivatives. They look correct. There are some functions with an optional dim argument that is not referenced, for example
pure module function softmax_r1_sp( x , dim ) result( y )
real(sp), intent(in) :: x(:)
real(sp) :: y(size(x))
integer, intent(in), optional :: dim
y = exp(x - maxval(x))
y = y / sum(y)
end function
Can unused dim arguments be removed from functions without too much manual work?
Thanks for the review @Beliavsky. I've reverted the use of dim for the rank-1 cases.
Regarding reading the results from files, one way could be using .npy files which stdlib can read. That would take quite some time though. I would suggest to do it as a separate PR for improving the testing, the current testing uses hard-coded reference values for the soft-family which are not elemental but require special treatment rank-wise.
I'll merge this one that has been hanging for several months. I'll add an issue to enhance the tests with an .npy data-base later on.