Pre-Training BLIP2 Log
Could you release the log of pretraining blip2 of stages 1 and 2? When I tried to retrain the model, the loss seemed difficult to reduce.
Hi, we do not fully support pre-training blip2 from scratch. Our current implementation will always load a pre-trained blip2 checkpoint by default. This could explain why you find the loss difficult to reduce, because the model is already pre-trained.
We are incrementally working on supporting pre-training from scratch. In the mean time, feel free to modify our source code to achieve this purpose, and welcome to post your solutions here. Thanks!
I have modified the code for pretraining from scratch and do not load a pre-trained blip2 checkpoint. But the loss is still difficult to reduce.
I just changed the part to make the model not load the pre-training checkpoint and use the 8 A100 for training.
Could you share your modifications and the loss logs before&after your change?
\\ blip2_qformer
load_finetuned = cfg.get("load_finetuned", True)
load_pretrained = cfg.get("load_pretrained", True)
if load_finetuned or load_pretrained:
model.load_checkpoint_from_config(cfg)
return model
stage 1 logs for 3m images {"train_lr": "0.000", "train_loss": "7.367"} {"train_lr": "0.000", "train_loss": "5.705"} {"train_lr": "0.000", "train_loss": "5.368"} {"train_lr": "0.000", "train_loss": "5.168"} {"train_lr": "0.000", "train_loss": "5.013"} {"train_lr": "0.000", "train_loss": "4.878"} {"train_lr": "0.000", "train_loss": "4.758"} {"train_lr": "0.000", "train_loss": "4.662"} {"train_lr": "0.000", "train_loss": "4.584"} {"train_lr": "0.000", "train_loss": "4.528"}
stage 1 logs for 20M images {"train_lr": "0.000", "train_loss": "8.089"} {"train_lr": "0.000", "train_loss": "7.400"} {"train_lr": "0.000", "train_loss": "7.237"} {"train_lr": "0.000", "train_loss": "7.153"} {"train_lr": "0.000", "train_loss": "7.090"}
run: task: image_text_pretrain optimizer lr_sched: "linear_warmup_cosine_lr" init_lr: 1e-4 min_lr: 1e-5 warmup_lr: 1e-6
weight_decay: 0.05 max_epoch: 10 batch_size_train: 100 # 100 batch_size_eval: 64 # 64 num_workers: 4 warmup_steps: 5000
seed: 42 output_dir: "output/BLIP2/Pretrain_stage1"
amp: True resume_ckpt_path: null
evaluate: False train_splits: ["train"]
device: "cuda" world_size: 1 dist_url: "env://" distributed: True
This loss has a normal decrease as expected. Note that the logged loss is averaged within a short window, so some fluctuation is normal.
Thanks. Could you tell me how much loss is a good convergence of stage 1 and stage 2? When I train the model for 10 epochs. the loss seems still high.
The logged loss is the total sum of the 3 pre-training losses.
The value of the loss depends on the pre-training dataset. It is hard to define a threshold for "good" loss. You may use a validation dataset to evaluate the model's performance during pre-training.
OK. I will try again and give back the results. Thank you for your patience.
@phellonchen Hi,have you succeeded pretraining the model from scratch?
Yes. Thanks for the authors' excellent works. In the process of trying to train from scratch, I found that the language model I used at the beginning was too small (1B), so it was difficult to obtain good results without Finetune language model. When I changed the language model to a model with 6B parameters, I got good results.
This loss has a normal decrease as expected. Note that the logged loss is averaged within a short window, so some fluctuation is normal.
Thanks for your reply. I want to pretrain Blip2 on chinese dataset. So I should replace the language model in English version with language model in Chinese version. Can you suggest which chinese language model should I use?
Yes. Thanks for the authors' excellent works. In the process of trying to train from scratch, I found that the language model I used at the beginning was too small (1B), so it was difficult to obtain good results without Finetune language model. When I changed the language model to a model with 6B parameters, I got good results.
Hi @phellonchen , Is it possible to release the pre-training code? I want to test BLIP-2 on Chinese.
Hi @phellonchen, could you maybe elaborate on what you needed to change to perform stage 1 pre-training? That would be really helpful!
该损失按预期正常减少。请注意,记录的损失是在一个短窗口内平均的,因此一些波动是正常的。
感谢您的回复。我想在中文数据集上预训练 Blip2。所以我应该用中文版本的语言模型替换英文版本的语言模型。你能建议我应该使用哪种中文语言模型吗?
该损失按预期正常减少。请注意,记录的损失是在一个短窗口内平均的,因此一些波动是正常的。
感谢您的回复。我想在中文数据集上预训练 Blip2。所以我应该用中文版本的语言模型替换英文版本的语言模型。你能建议我应该使用哪种中文语言模型吗?
Same question!