basalt Implement More Activation Functions

The README mentioned that more activation functions are on the road map. I have gotten started. As of publishing, I only added Threshold, but I plan to go down the list from the (PyTorch Docs)[https://pytorch.org/docs/stable/nn.functional.html#non-linear-activation-functions]. Since I'm just going down the list, I'll do HardTanh next, unless someone has one they want written first. I am willing to write documentation if that is needed. Otherwise, I am trying to keep everything as close to the PyTorch docs as possible. If anybody thinks I should follow Tensoflow, Jax, or something else's docs instead, I am totally up for it as well. Here's the current TODO list for what I plan to implement first:

[x] threshold
[x] relu
[x] hardtanh
[ ] hardswish
[ ] selu
[ ] celu
[ ] leaky_relu
[ ] gelu
[ ] softsign
[ ] softplus
[ ] softmin
[x] softmax
[x] log_softmax
[x] tanh
[x] sigmoid
[ ] hardsigmoid
[ ] mish
[ ] normalize

Once I get all of those done, I'll do the rest of the ones listed on the PyTorch docs, assuming my schedule allows for it. I am more than happy to switch out anything on the list.

Also, I've noticed that the current AttributeVector system lead to a lot of type conversions, which I feel could hinder performance. I don't totally plan to look into it right now, although I could if it were necessary.

There also seem to be a lot of overloads in the test file. Once I implement a few more activation functions, I'd like to see if I can split it into just a few basic categories that can be widely used across all the different activation functions.

May 16 '24 21:05 FrostyTheSouthernSnowman

For the activation functions the tests for the backward part and also forward (apart from test_activations) would be in tests/mojo/test_mlops and tests/python/test_mlops_torch. And I wouldn't say there are activation function to prioritize, but i would say the only necessary for now would be leaky_relu, gelu and selu (from the ones mentioned i think those are the most used), but I think (not sure) the non approximation (the non tanh) version of gelu is complicated to implement. Because maybe we have to think about how to divide some parts of the code base, so some cleanups will be necessary. So I think with only those function would be enough I think.

May 19 '24 00:05 andresnowak

Ok. Will work on leaky_relu and selu, and I'll do gelu last.

May 22 '24 12:05 FrostyTheSouthernSnowman

Also, I was looking through the code and I realized that there are tests in test_mlops.mojo for the activation functions that I forgot to write. Will write those for threshold, hard_tanh, and leaky_relu, as well as the tests in test_activations

May 22 '24 12:05 FrostyTheSouthernSnowman

There are also tests in tests/python/test_mlops_torch.mojo. Those are the three places test_activations, test_mlops and test_mlops_torch

May 22 '24 16:05 andresnowak

The torch compatibility tests for threshold and hard tanh show there to be some bugs. Are they important enough to be worth debugging or should I just continue to gelu and selu and just delete all the threshold and hardtanh code?

May 31 '24 14:05 FrostyTheSouthernSnowman

If there are errors when comparing with the torch version yeah they should be fixed. But if you want yeah you can delete them and work with gelu, selu or you can fix the hard thanh and threshold.

May 31 '24 16:05 andresnowak

Ok. Haven't ever heard of them being used anyways. I'll just get rid of that code and work on the gelu and selu

May 31 '24 18:05 FrostyTheSouthernSnowman

Is there an OP for elementwise min or max? Like what is denoted mathematically by say min(0, x)? Would be useful in a few places. And could simplify some implementations.

Jun 07 '24 11:06 FrostyTheSouthernSnowman

yeah in autograd/ops/basics there is the max op, or in utils/tensorutils there is the reduce op (only reduce all or over one dimension)

Jun 07 '24 18:06 andresnowak

Doesn't the current max op get the max value in the tensor? I need something that gets the max or min between two values.

Jun 07 '24 18:06 FrostyTheSouthernSnowman

that is part of mojo, there is already a max op in mojo in the math module (that gets the max value or min the min value between two simd values)

Jun 07 '24 18:06 andresnowak

Thanks! I'll use that then

Jun 07 '24 18:06 FrostyTheSouthernSnowman

Unfortunately I'm busier now than anticipated and I won't be able to complete the other activation functions. I do have full testing however for leaky_relu and I'd like to at least merge that in if possible.

Jun 28 '24 13:06 FrostyTheSouthernSnowman

And also thank you for the contribution (forgot to say, sorry)

Jun 29 '24 03:06 andresnowak

Oh thank you! No worries. Sorry I couldn’t finish what I hoped to.On Jun 28, 2024, at 11:32 PM, Andres Nowak @.***> wrote: And also thank you for the contribution

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

Jun 29 '24 11:06 FrostyTheSouthernSnowman