LAVIS Pre-training code of BLIP-2

Thanks for your awesome work in BLIP-2, it displays surprising abilities when conjoining LLM and image encoder!

Do you plan to release the code to pre-train such a model? We are looking forward to that :)

Feb 05 '23 09:02 Richar-Du

There are extra pre-training logics not supported on the main branch of LAVIS at this stage. We'd like to update the runner in order to address the issue.

We will take an incremental approach and try our best to work on the release, yet it won't be immediate.

Thanks for your understanding.

Feb 06 '23 01:02 dxli94

There are extra pre-training/finetuning logics not supported on the main branch of LAVIS at this stage. We'd like to update the runner in order to address the issue.

We will take an incremental approach and try our best to work on the release, yet it won't be immediate.

Thanks for your understanding.

Hi, I am trying to fine-tune BLIP2 for my custom dataset.

According to this comment, we only need to execute train.py and pass in a runtime config yaml.

Are there any details you mentioned I need to take care of if I want to fine-tune BLIP2? Thank you!

Feb 06 '23 06:02 kaipo-chang

The library supports finetuning and there is no extra work needed. You may take a look at finetuning scripts for BLIP1 as a reference as for now. Detailed parameters can be found in the paper.

For pre-training, we need some patches to the current main branch.

Thanks.

Feb 06 '23 07:02 dxli94

The library supports finetuning and there is no extra work needed. You may take a look at finetuning scripts for BLIP1 as a reference as for now. Detailed parameters can be found in the paper.

For pre-training, we need some patches to the current main branch.

Thanks.

Hi, I try to fine-tune the coco captioning task with the pretrain_opt2.7b checkpoint. However, the loss is keeping nan. Below is the used config, and I don't modify any other code. Can you please help check if there is anything wrong? Thanks!

 # Copyright (c) 2022, salesforce.com, inc.
 # All rights reserved.
 # SPDX-License-Identifier: BSD-3-Clause
 # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause

model:
  arch: blip2_opt
  model_type: pretrain_opt2.7b
  load_finetuned: False
  use_grad_checkpoint: True

datasets:
  coco_caption: # name of the dataset builder
    vis_processor:
        train:
          name: "blip_image_train"
          image_size: 224
        eval:
          name: "blip_image_eval"
          image_size: 224
    text_processor:
        train:
          name: "blip_caption"
          prompt: "a photo of "
        eval:
          name: "blip_caption"
    build_info:
        images:
            storage: '/Project2/Data/COCO/images/'
            
run:
  task: captioning
  # optimizer
  lr_sched: "linear_warmup_cosine_lr"
  init_lr: 1e-5
  min_lr: 0
  warmup_lr: 1e-8
  warmup_steps: 1000
  weight_decay: 0.05
  max_epoch: 5
  batch_size_train: 1
  batch_size_eval: 1
  accum_grad_iters: 16
  num_workers: 4

  max_len: 40
  min_len: 8
  num_beams: 5

  seed: 42
  output_dir: "output/BLIP2/Caption_cub"

  amp: True
  resume_ckpt_path: null

  evaluate: False 
  train_splits: ["train"]
  valid_splits: ["val"]
  test_splits: ["test"]
  
  device: "cuda"
  world_size: 1
  dist_url: "env://"
  distributed: False```

Feb 07 '23 02:02 jasper0314-huang

Hi @jasper0314-huang ,

We have now fixed the issue. Can you please pull and try again? Thanks.

Feb 10 '23 12:02 dxli94

Hi everyone, please pull https://github.com/salesforce/LAVIS/commit/3ac397aa075c3e60b9521b012dda3660e3e35f1e for updates.

Feb 13 '23 00:02 dxli94

Hi, @dxli94 ,

Thanks for your awesome work.

I was wondering why there is only pretrain_stage2.sh?

Can I just replace pretrain_stage2.yaml with pretrain_stage1.yaml for stage 1 pretraining?

Feb 24 '23 09:02 ShoufaChen

Hi @ShoufaChen ,

I am also cunning this code. I guess we cannot just change to pretrain_stage1.yaml because right now the code seems not to support pretrain from scratch.

The stage 2 pretraining is initialized from stage 1 pretrained model which will be downloaded from their checkpoint.

If you want to pretrain stage 1 from scratch, you may check this issue #149, or if you want to continue pretrain from a checkpoint, you can just change to pretrain_stage1.yaml.

Also waiting for the authors to reply and release their code to pretrain from scratch.

Thanks!

Feb 25 '23 08:02 ZihaoLin0123

BLIP-2 stage-1 pre-training from scratch is now supported, just run bash run_scripts/blip2/train/pretrain_stage1.sh. Thank you for your patience and support.

May 08 '23 01:05 LiJunnan1992