LAVIS Pre-Training BLIP2 Log

Could you release the log of pretraining blip2 of stages 1 and 2? When I tried to retrain the model, the loss seemed difficult to reduce.

Feb 22 '23 02:02 phellonchen

Hi, we do not fully support pre-training blip2 from scratch. Our current implementation will always load a pre-trained blip2 checkpoint by default. This could explain why you find the loss difficult to reduce, because the model is already pre-trained.

We are incrementally working on supporting pre-training from scratch. In the mean time, feel free to modify our source code to achieve this purpose, and welcome to post your solutions here. Thanks!

Feb 23 '23 02:02 LiJunnan1992

I have modified the code for pretraining from scratch and do not load a pre-trained blip2 checkpoint. But the loss is still difficult to reduce.

I just changed the part to make the model not load the pre-training checkpoint and use the 8 A100 for training.

Feb 23 '23 03:02 phellonchen

Could you share your modifications and the loss logs before&after your change?

Feb 23 '23 03:02 LiJunnan1992

\\ blip2_qformer
        load_finetuned = cfg.get("load_finetuned", True)
        load_pretrained = cfg.get("load_pretrained", True)
        if load_finetuned or load_pretrained:
            model.load_checkpoint_from_config(cfg)

        return model

stage 1 logs for 3m images {"train_lr": "0.000", "train_loss": "7.367"} {"train_lr": "0.000", "train_loss": "5.705"} {"train_lr": "0.000", "train_loss": "5.368"} {"train_lr": "0.000", "train_loss": "5.168"} {"train_lr": "0.000", "train_loss": "5.013"} {"train_lr": "0.000", "train_loss": "4.878"} {"train_lr": "0.000", "train_loss": "4.758"} {"train_lr": "0.000", "train_loss": "4.662"} {"train_lr": "0.000", "train_loss": "4.584"} {"train_lr": "0.000", "train_loss": "4.528"}

stage 1 logs for 20M images {"train_lr": "0.000", "train_loss": "8.089"} {"train_lr": "0.000", "train_loss": "7.400"} {"train_lr": "0.000", "train_loss": "7.237"} {"train_lr": "0.000", "train_loss": "7.153"} {"train_lr": "0.000", "train_loss": "7.090"}

run: task: image_text_pretrain optimizer lr_sched: "linear_warmup_cosine_lr" init_lr: 1e-4 min_lr: 1e-5 warmup_lr: 1e-6

weight_decay: 0.05 max_epoch: 10 batch_size_train: 100 # 100 batch_size_eval: 64 # 64 num_workers: 4 warmup_steps: 5000

seed: 42 output_dir: "output/BLIP2/Pretrain_stage1"

amp: True resume_ckpt_path: null

evaluate: False train_splits: ["train"]

device: "cuda" world_size: 1 dist_url: "env://" distributed: True

Feb 23 '23 04:02 phellonchen

This loss has a normal decrease as expected. Note that the logged loss is averaged within a short window, so some fluctuation is normal.

Feb 23 '23 04:02 LiJunnan1992

Thanks. Could you tell me how much loss is a good convergence of stage 1 and stage 2? When I train the model for 10 epochs. the loss seems still high.

Feb 23 '23 05:02 phellonchen

The logged loss is the total sum of the 3 pre-training losses.

The value of the loss depends on the pre-training dataset. It is hard to define a threshold for "good" loss. You may use a validation dataset to evaluate the model's performance during pre-training.

Feb 23 '23 05:02 LiJunnan1992

OK. I will try again and give back the results. Thank you for your patience.

Feb 23 '23 05:02 phellonchen

@phellonchen Hi，have you succeeded pretraining the model from scratch?

Mar 16 '23 02:03 vanpersie32

Yes. Thanks for the authors' excellent works. In the process of trying to train from scratch, I found that the language model I used at the beginning was too small (1B), so it was difficult to obtain good results without Finetune language model. When I changed the language model to a model with 6B parameters, I got good results.

Mar 16 '23 07:03 phellonchen

This loss has a normal decrease as expected. Note that the logged loss is averaged within a short window, so some fluctuation is normal.

Thanks for your reply. I want to pretrain Blip2 on chinese dataset. So I should replace the language model in English version with language model in Chinese version. Can you suggest which chinese language model should I use?

Mar 17 '23 03:03 vanpersie32

Yes. Thanks for the authors' excellent works. In the process of trying to train from scratch, I found that the language model I used at the beginning was too small (1B), so it was difficult to obtain good results without Finetune language model. When I changed the language model to a model with 6B parameters, I got good results.

Hi @phellonchen , Is it possible to release the pre-training code? I want to test BLIP-2 on Chinese.

Mar 25 '23 09:03 etrigger

Hi @phellonchen, could you maybe elaborate on what you needed to change to perform stage 1 pre-training? That would be really helpful!

May 03 '23 09:05 ChantalMP

该损失按预期正常减少。请注意，记录的损失是在一个短窗口内平均的，因此一些波动是正常的。

感谢您的回复。我想在中文数据集上预训练 Blip2。所以我应该用中文版本的语言模型替换英文版本的语言模型。你能建议我应该使用哪种中文语言模型吗？

Same question!

Nov 29 '23 16:11 ZechengLi19