BLIP
BLIP copied to clipboard
Need clearly Understand of each checkpoint
Hi, thank you for your great work. I am little bit confused about the checkpoint that post on the repository. I saw the paper at "Pre-training Details" section, pretrianed dataset is 14M including COCO, Flickr.... which match the checkpoint with the link https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_14M.pth right?
Also did model_base_14M and model_base (https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base.pth) all use CapFilt?
Thank you for your help