LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

Reproducing BLIP2 COCO ITM Fine-tuning and Adding New Data

Open yonatanbitton opened this issue 1 year ago • 3 comments

Hey BLIP-2 team,

Thanks for your great work! I've been trying to reproduce the BLIP2 COCO ITM fine-tuning using the resources in your repo:

  1. train.py
  2. blip_image_text_matching.ipynb
  3. train_caption_coco.sh
  4. blip_itm_large.yaml

I couldn't find specific instructions or a command to reproduce the COCO ITM fine-tuning. As I understand train_caption_coco.sh relates to captioning and blip_itm_large.yaml is BLIP1 not BLIP2. I also searched in the code and previous GitHub issues. Could you share the exact command or script to run this?

Also, I plan to add new fine-tuning data later. Any tips on incorporating new data would be awesome.

Thanks for your help and your amazing work on BLIP-2!

yonatanbitton avatar May 02 '23 11:05 yonatanbitton

@LiJunnan1992 pinging to see if you have an idea about this issue 🙏 🙌

yonatanbitton avatar May 09 '23 12:05 yonatanbitton

You can create a blip2_retrieval model by modifying blip2_qformer to take into account samples["image_id"] when computing ITC and ITM, as done in blip_retrieval.

Then, you can create a yaml file for training on coco retrieval by following the template of this file.

For adding new dataset, you may refer to the LAVIS documentation.

LiJunnan1992 avatar May 09 '23 23:05 LiJunnan1992

You can create a blip2_retrieval model by modifying blip2_qformer to take into account samples["image_id"] when computing ITC and ITM, as done in blip_retrieval.

Then, you can create a yaml file for training on coco retrieval by following the template of this file.

For adding new dataset, you may refer to the LAVIS documentation.

Could you please release the code so that we can reproduce the result? I cannot make it work based on this information. Thanks much!

shengyi4 avatar May 15 '23 00:05 shengyi4

@LiJunnan1992 sorry for the late response, but I also can't reproduce your results based on this information. Is there any chance to provide your implementation first to reproduce the results on ITM? Later we can try to understand how to fit this into new data. Supplying that will allow several valuable extensions of the BLIP2 model 🙏 (also to follow up on this Tweet). Thank you 🙌

yonatanbitton avatar Jul 02 '23 13:07 yonatanbitton

@yonatanbitton @shengyi4 You can now finetune for retrieval by running this script: https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip2/train/train_retrieval_coco.sh

LiJunnan1992 avatar Jul 03 '23 07:07 LiJunnan1992

@yonatanbitton @shengyi4 You can now finetune for retrieval by running this script: https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip2/train/train_retrieval_coco.sh

Thank you very much, I am checking that

yonatanbitton avatar Jul 03 '23 20:07 yonatanbitton