Pre-training code of BLIP-2
Thanks for your awesome work in BLIP-2, it displays surprising abilities when conjoining LLM and image encoder!
Do you plan to release the code to pre-train such a model? We are looking forward to that :)
There are extra pre-training logics not supported on the main branch of LAVIS at this stage. We'd like to update the runner in order to address the issue.
We will take an incremental approach and try our best to work on the release, yet it won't be immediate.
Thanks for your understanding.
There are extra pre-training/finetuning logics not supported on the main branch of LAVIS at this stage. We'd like to update the runner in order to address the issue.
We will take an incremental approach and try our best to work on the release, yet it won't be immediate.
Thanks for your understanding.
Hi, I am trying to fine-tune BLIP2 for my custom dataset.
According to this comment, we only need to execute train.py and pass in a runtime config yaml.
Are there any details you mentioned I need to take care of if I want to fine-tune BLIP2? Thank you!
The library supports finetuning and there is no extra work needed. You may take a look at finetuning scripts for BLIP1 as a reference as for now. Detailed parameters can be found in the paper.
For pre-training, we need some patches to the current main branch.
Thanks.
The library supports finetuning and there is no extra work needed. You may take a look at finetuning scripts for BLIP1 as a reference as for now. Detailed parameters can be found in the paper.
For pre-training, we need some patches to the current main branch.
Thanks.
Hi, I try to fine-tune the coco captioning task with the pretrain_opt2.7b checkpoint. However, the loss is keeping nan. Below is the used config, and I don't modify any other code. Can you please help check if there is anything wrong? Thanks!
# Copyright (c) 2022, salesforce.com, inc.
# All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
# For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause
model:
arch: blip2_opt
model_type: pretrain_opt2.7b
load_finetuned: False
use_grad_checkpoint: True
datasets:
coco_caption: # name of the dataset builder
vis_processor:
train:
name: "blip_image_train"
image_size: 224
eval:
name: "blip_image_eval"
image_size: 224
text_processor:
train:
name: "blip_caption"
prompt: "a photo of "
eval:
name: "blip_caption"
build_info:
images:
storage: '/Project2/Data/COCO/images/'
run:
task: captioning
# optimizer
lr_sched: "linear_warmup_cosine_lr"
init_lr: 1e-5
min_lr: 0
warmup_lr: 1e-8
warmup_steps: 1000
weight_decay: 0.05
max_epoch: 5
batch_size_train: 1
batch_size_eval: 1
accum_grad_iters: 16
num_workers: 4
max_len: 40
min_len: 8
num_beams: 5
seed: 42
output_dir: "output/BLIP2/Caption_cub"
amp: True
resume_ckpt_path: null
evaluate: False
train_splits: ["train"]
valid_splits: ["val"]
test_splits: ["test"]
device: "cuda"
world_size: 1
dist_url: "env://"
distributed: False```
Hi @jasper0314-huang ,
We have now fixed the issue. Can you please pull and try again? Thanks.
Hi everyone, please pull https://github.com/salesforce/LAVIS/commit/3ac397aa075c3e60b9521b012dda3660e3e35f1e for updates.
Hi, @dxli94 ,
Thanks for your awesome work.
I was wondering why there is only pretrain_stage2.sh?
Can I just replace pretrain_stage2.yaml with pretrain_stage1.yaml for stage 1 pretraining?
Hi @ShoufaChen ,
I am also cunning this code. I guess we cannot just change to pretrain_stage1.yaml because right now the code seems not to support pretrain from scratch.
The stage 2 pretraining is initialized from stage 1 pretrained model which will be downloaded from their checkpoint.
If you want to pretrain stage 1 from scratch, you may check this issue #149, or if you want to continue pretrain from a checkpoint, you can just change to pretrain_stage1.yaml.
Also waiting for the authors to reply and release their code to pretrain from scratch.
Thanks!
BLIP-2 stage-1 pre-training from scratch is now supported, just run bash run_scripts/blip2/train/pretrain_stage1.sh.
Thank you for your patience and support.