ImageBind Confuse about ImageNet1k results

Confuse about ImageNet1k results

Open LinB203 opened this issue 1 year ago • 0 comments

Wonderful work! In Table 2, the top-1 accuray of ImageNet1k is 77.7%, which is higher than CLIP(OpenCLIP) by 2.2%(2.0%). But ImageBind did not train the vision encoder and text encoder, so what make results different or anything I miss?

Jun 14 '23 12:06 LinB203

ImageBind ImageBind copied to clipboard

Confuse about ImageNet1k results

ImageBind
ImageBind copied to clipboard