stdlib icon indicating copy to clipboard operation
stdlib copied to clipboard

activation intrinsics for neural networks

Open jalvesz opened this issue 1 year ago • 1 comments

Add activation functions for neural networks. Relates to https://github.com/fortran-lang/stdlib/issues/858

Todo:

  • [x] specs & autodoc
  • [x] tests
  • [ ] examples

Previous Art SciPy Special functions collection: https://docs.scipy.org/doc/scipy/reference/special.html torch nn: https://pytorch.org/docs/stable/nn.html#non-linear-activations-other neural-fortran : https://github.com/modern-fortran/neural-fortran/blob/main/src/nf/nf_activation.f90

cc: @Beliavsky @milancurcic @epagone @jvdp1 @perazz

jalvesz avatar Aug 13 '24 19:08 jalvesz

Before moving forward, any opinions on

  • putting these functions within the "specialfunctions" category?
  • naming of the derivative/gradient version with <name>_grad?
  • any other remark?

jalvesz avatar Aug 23 '24 13:08 jalvesz

Thanks @perazz for your review! regarding tanh and erf, I wonder if I should actually remove reference to the intrinsic names and simply leave a reference to the fast approximation as that is what makes sense for NNs. Also, should these functions stay here or moved to the intrinsics module?

Regarding

I'm fine with the f prefix (or fast_ would also be OK). I believe that should be standardized somewhere, for example in the library style guide?

I agree with you and also wonder which one would be preferable, I don't have strong oppinions on that.

jalvesz avatar Apr 07 '25 20:04 jalvesz

I think that for now, it makes sense to only have these functions as part of the activation functions submodule (it's natural that they are approximated versions in this context). For other functions overloading intrinsics, there is no single rule:

  • gamma, log_gamma directly overwrite the intrinsics name
  • stdlib_sum*, stdlib_dot_product* intrinsics have the stdlib_ prefix.

So, perhaps adding another rule (f* prefix) is not so desirable? Maybe, they should be named stdlib_tanh and stdlib_erf in line with the other overloaded intrinsics. Just a thought.

perazz avatar Apr 09 '25 06:04 perazz

@perazz so for the fast functions I propose here to use the naming fast_<> as I felt that a stdlib_<> feels like stdlib's reference implementation, but here we are proposing degraded accuracy for the sake of activation functions. I've added the documentation for those and also extended the tests for different accuracies but leaving the tolerance fixed to a low accuracy value.

jalvesz avatar Apr 14 '25 19:04 jalvesz

A rigorous test would be to print the activation function and derivative values for a range of arguments, write them to a file, and compare them with a reference implementation in Python. How feasible is that?

I have looked at the code and checked a few of the activation functions and derivatives. They look correct. There are some functions with an optional dim argument that is not referenced, for example

pure module function softmax_r1_sp( x , dim ) result( y )
    real(sp), intent(in) :: x(:)
    real(sp) :: y(size(x))
    integer, intent(in), optional :: dim

    y = exp(x - maxval(x))
    y = y / sum(y)
end function

Can unused dim arguments be removed from functions without too much manual work?

Beliavsky avatar May 01 '25 21:05 Beliavsky

Thanks for the review @Beliavsky. I've reverted the use of dim for the rank-1 cases.

Regarding reading the results from files, one way could be using .npy files which stdlib can read. That would take quite some time though. I would suggest to do it as a separate PR for improving the testing, the current testing uses hard-coded reference values for the soft-family which are not elemental but require special treatment rank-wise.

jalvesz avatar May 02 '25 21:05 jalvesz

I'll merge this one that has been hanging for several months. I'll add an issue to enhance the tests with an .npy data-base later on.

jalvesz avatar May 16 '25 08:05 jalvesz