LAVIS [BLIP2] How to perform stage 1 Vision-Language Representation bootstraping

[BLIP2] How to perform stage 1 Vision-Language Representation bootstraping

Open klima7 opened this issue 1 year ago • 1 comments

I think that stage 1 learning, that means visual-language representation learning with those three objectives mentioned in the article is not yet implemented. Am I right?

Not implemented load_pretrained: False in pretrain_stage1.yaml, is only part of the problem. LLM is connected so it's rather Generative Learning from stage 2. Is there currently any way to properly perform stage 1 Vision-Language Representation bootstraping?

Apr 06 '23 07:04 klima7

You could run BLIP-2 stage 1 pre-training now with bash run_scripts/blip2/train/pretrain_stage1.sh. Thank you.

May 08 '23 01:05 LiJunnan1992

LAVIS LAVIS copied to clipboard

[BLIP2] How to perform stage 1 Vision-Language Representation bootstraping

LAVIS
LAVIS copied to clipboard