Vision-Transformer icon indicating copy to clipboard operation
Vision-Transformer copied to clipboard

Tensorflow 2.x implementation of Vision-Transformer model

Vision Transformer

Unofficial Tensorflow 2.x implementation of the Transformer based Image Classification model proposed by the paper AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE. The paper is currently under double-blind review.

GELU implementation has been taken from the latest master branch of EchoAI.