MetaTransformer
MetaTransformer copied to clipboard
Issues about Image Classification
Hi, thanks for your outstanding work! I am trying to use meta-transformer to conduct image classification. I noticed that in the paper, you wrote "On image classification, with the help of CLIP [24] text encoder, Meta-Transformer delivers great performances under zero-shot classification". Does it mean that I need to use the CLIP text encoder to help realize image classification, rather than using a simple linear layer? Looking forward to your reply!