Xinhao Li
Results
4
issues of
Xinhao Li
The reason why the parameters of BIKE are smaller than the original CLIP ViT-L/14 is that in the BIKE model, we only utilize the vision encoder from CLIP and do...
 I have obtained similar results using checkpoint test provided by you, but I only obtained about 46 checkpoints using [BLIP-2 checkpoints](https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained.pth). May I ask what might be the problem?