vit-pytorch icon indicating copy to clipboard operation
vit-pytorch copied to clipboard

Different transformer implementations with huggingface vit

Open askerlee opened this issue 3 years ago • 4 comments

Hi Phil, thanks for the great repo. I compared your implementation of ViT with huggingface's (https://github.com/huggingface/transformers/blob/master/src/transformers/models/vit/modeling_vit.py) and found some subtle differences. In particular:

  1. in the attention module, huggingface's vit has a dropout after softmax (i.e., to the attention probability matrix). But yours doesn't (unless there's a linear projection layer).
  2. in the FFN, after the first linear transformation, yours have a dropout but huggingface's doesn't have.
  3. The default dropout rate you adopted is 0.1, whereas huggingface vit is 0.15. I wonder have you tried different ways of dropout? Would they produce any noticeable differences? Thank you very much.

askerlee avatar Jun 28 '21 05:06 askerlee

BTW the transformer layers used by huggingface's vit is basically a verbatim copy of the transformer used in their Bert model.

askerlee avatar Jun 28 '21 05:06 askerlee

I also checked rwightman's pytorch-image-models. The vision transformer he implemented has all the dropouts. The dropout rate is 0.1, the same as yours.

askerlee avatar Jun 28 '21 07:06 askerlee

Do these fine details matter so much?

shabie avatar Sep 16 '21 21:09 shabie

Won't have big impact, but maybe some fraction of a point. I don't have enough computation resources to find out...

askerlee avatar Sep 17 '21 03:09 askerlee