LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

[BLIP2] How to perform stage 1 Vision-Language Representation bootstraping

Open klima7 opened this issue 1 year ago • 1 comments

I think that stage 1 learning, that means visual-language representation learning with those three objectives mentioned in the article is not yet implemented. Am I right?

Not implemented load_pretrained: False in pretrain_stage1.yaml, is only part of the problem. LLM is connected so it's rather Generative Learning from stage 2. Is there currently any way to properly perform stage 1 Vision-Language Representation bootstraping?

klima7 avatar Apr 06 '23 07:04 klima7