torchmd-net
torchmd-net copied to clipboard
Add GLU, Swish, Mish and SwiGLU
- [x] Added Swish and Mish as possible activation functions.
- [x] Added a Gated Linear Unit (GLU) module ( eq. 1 here https://arxiv.org/pdf/1612.08083 ), but the activation function is a parameter.
- [x] Added the SwiGLU activation function https://arxiv.org/pdf/2002.05202
SwiGLU has learnable parameters and requires knowing the input/output shapes at creation, unlike the rest of the activation functions (e.g. SiLU). So, it won't be able to be selected as an activation function in a yaml file right now.
I added it as a tool for NNP development.
Mish and Swish can be selected via a yaml file.