ImageBind
ImageBind copied to clipboard
Confuse about ImageNet1k results
Wonderful work! In Table 2, the top-1 accuray of ImageNet1k is 77.7%, which is higher than CLIP(OpenCLIP) by 2.2%(2.0%). But ImageBind did not train the vision encoder and text encoder, so what make results different or anything I miss?