LAVIS
LAVIS copied to clipboard
[BLIP2] How to perform stage 1 Vision-Language Representation bootstraping
I think that stage 1 learning, that means visual-language representation learning with those three objectives mentioned in the article is not yet implemented. Am I right?
Not implemented load_pretrained: False
in pretrain_stage1.yaml
, is only part of the problem. LLM is connected so it's rather Generative Learning from stage 2. Is there currently any way to properly perform stage 1 Vision-Language Representation bootstraping?
You could run BLIP-2 stage 1 pre-training now with bash run_scripts/blip2/train/pretrain_stage1.sh
.
Thank you.