LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

Pre-Training BLIP2 Log

Open phellonchen opened this issue 2 years ago • 8 comments

Could you release the log of pretraining blip2 of stages 1 and 2? When I tried to retrain the model, the loss seemed difficult to reduce.

phellonchen avatar Feb 22 '23 02:02 phellonchen

Hi, we do not fully support pre-training blip2 from scratch. Our current implementation will always load a pre-trained blip2 checkpoint by default. This could explain why you find the loss difficult to reduce, because the model is already pre-trained.

We are incrementally working on supporting pre-training from scratch. In the mean time, feel free to modify our source code to achieve this purpose, and welcome to post your solutions here. Thanks!

LiJunnan1992 avatar Feb 23 '23 02:02 LiJunnan1992

I have modified the code for pretraining from scratch and do not load a pre-trained blip2 checkpoint. But the loss is still difficult to reduce.

I just changed the part to make the model not load the pre-training checkpoint and use the 8 A100 for training.

phellonchen avatar Feb 23 '23 03:02 phellonchen

Could you share your modifications and the loss logs before&after your change?

LiJunnan1992 avatar Feb 23 '23 03:02 LiJunnan1992

\\ blip2_qformer
        load_finetuned = cfg.get("load_finetuned", True)
        load_pretrained = cfg.get("load_pretrained", True)
        if load_finetuned or load_pretrained:
            model.load_checkpoint_from_config(cfg)

        return model

stage 1 logs for 3m images {"train_lr": "0.000", "train_loss": "7.367"} {"train_lr": "0.000", "train_loss": "5.705"} {"train_lr": "0.000", "train_loss": "5.368"} {"train_lr": "0.000", "train_loss": "5.168"} {"train_lr": "0.000", "train_loss": "5.013"} {"train_lr": "0.000", "train_loss": "4.878"} {"train_lr": "0.000", "train_loss": "4.758"} {"train_lr": "0.000", "train_loss": "4.662"} {"train_lr": "0.000", "train_loss": "4.584"} {"train_lr": "0.000", "train_loss": "4.528"}

stage 1 logs for 20M images {"train_lr": "0.000", "train_loss": "8.089"} {"train_lr": "0.000", "train_loss": "7.400"} {"train_lr": "0.000", "train_loss": "7.237"} {"train_lr": "0.000", "train_loss": "7.153"} {"train_lr": "0.000", "train_loss": "7.090"}

run: task: image_text_pretrain optimizer lr_sched: "linear_warmup_cosine_lr" init_lr: 1e-4 min_lr: 1e-5 warmup_lr: 1e-6

weight_decay: 0.05 max_epoch: 10 batch_size_train: 100 # 100 batch_size_eval: 64 # 64 num_workers: 4 warmup_steps: 5000

seed: 42 output_dir: "output/BLIP2/Pretrain_stage1"

amp: True resume_ckpt_path: null

evaluate: False train_splits: ["train"]

device: "cuda" world_size: 1 dist_url: "env://" distributed: True

phellonchen avatar Feb 23 '23 04:02 phellonchen

This loss has a normal decrease as expected. Note that the logged loss is averaged within a short window, so some fluctuation is normal.

LiJunnan1992 avatar Feb 23 '23 04:02 LiJunnan1992

Thanks. Could you tell me how much loss is a good convergence of stage 1 and stage 2? When I train the model for 10 epochs. the loss seems still high.

phellonchen avatar Feb 23 '23 05:02 phellonchen

The logged loss is the total sum of the 3 pre-training losses.

The value of the loss depends on the pre-training dataset. It is hard to define a threshold for "good" loss. You may use a validation dataset to evaluate the model's performance during pre-training.

LiJunnan1992 avatar Feb 23 '23 05:02 LiJunnan1992

OK. I will try again and give back the results. Thank you for your patience.

phellonchen avatar Feb 23 '23 05:02 phellonchen

@phellonchen Hi,have you succeeded pretraining the model from scratch?

vanpersie32 avatar Mar 16 '23 02:03 vanpersie32

Yes. Thanks for the authors' excellent works. In the process of trying to train from scratch, I found that the language model I used at the beginning was too small (1B), so it was difficult to obtain good results without Finetune language model. When I changed the language model to a model with 6B parameters, I got good results.

phellonchen avatar Mar 16 '23 07:03 phellonchen

This loss has a normal decrease as expected. Note that the logged loss is averaged within a short window, so some fluctuation is normal.

Thanks for your reply. I want to pretrain Blip2 on chinese dataset. So I should replace the language model in English version with language model in Chinese version. Can you suggest which chinese language model should I use?

vanpersie32 avatar Mar 17 '23 03:03 vanpersie32

Yes. Thanks for the authors' excellent works. In the process of trying to train from scratch, I found that the language model I used at the beginning was too small (1B), so it was difficult to obtain good results without Finetune language model. When I changed the language model to a model with 6B parameters, I got good results.

Hi @phellonchen , Is it possible to release the pre-training code? I want to test BLIP-2 on Chinese.

etrigger avatar Mar 25 '23 09:03 etrigger

Hi @phellonchen, could you maybe elaborate on what you needed to change to perform stage 1 pre-training? That would be really helpful!

ChantalMP avatar May 03 '23 09:05 ChantalMP

该损失按预期正常减少。请注意,记录的损失是在一个短窗口内平均的,因此一些波动是正常的。

感谢您的回复。我想在中文数据集上预训练 Blip2。所以我应该用中文版本的语言模型替换英文版本的语言模型。你能建议我应该使用哪种中文语言模型吗?

该损失按预期正常减少。请注意,记录的损失是在一个短窗口内平均的,因此一些波动是正常的。

感谢您的回复。我想在中文数据集上预训练 Blip2。所以我应该用中文版本的语言模型替换英文版本的语言模型。你能建议我应该使用哪种中文语言模型吗?

Same question!

ZechengLi19 avatar Nov 29 '23 16:11 ZechengLi19