vit-pytorch
vit-pytorch copied to clipboard
Apply Tanh activation function to ViT - MLP Head
In the paper:
"In order to stay as close as possible to the original Transformer model, we made use of an additional [class] token, which is taken as image representation. The output of this token is then transformed into a class prediction via a small multi-layer perceptron (MLP) with tanh as non-linearity in the single hidden layer."
https://github.com/lucidrains/vit-pytorch/blob/5699ed7d139062020d1394f0e85a07f706c87c09/vit_pytorch/vit.py#L110-L113
Should there be a Tanh() function applied after the linear layer?