keras-nlp icon indicating copy to clipboard operation
keras-nlp copied to clipboard

Add BLIP model

Open innat opened this issue 2 years ago • 0 comments

Is your feature request related to a problem? Please describe.

BLIP: Bootstrapping Language-Image Pre-training (2022 ), is a model that is able to perform various multi-modal tasks including

  • Image Captioning
  • Visual Question Answering
  • Image-Text retrieval (Image-text matching)

(Cited by 247, until now)

relevant discussion

Describe the solution you'd like

Info: I've went through the source code of official BLIP repo, to its image-captioning mode, and found that most of the codes are taken from huggingface-transformer and modified with their proposed solutions. The nlp component of BLIP is BERT in most part. Shortly

As HF also provides TF-BERT as well, its straightforward to translate code but with KerasNLP-BERT, it might need extra care.

Describe alternatives you've considered

none.

Additional context

  • For BLIP, the CV component is only Vision Transformer for feature extraction. KerasCV provides the vit model. The larger part of BLIP model consist of NLP component (specifically BERT).
  • BLIP 2 - HF-Blog

update

TF version of BLIP is added to Huggingface-Transformer ❤️ cc. @Rocketknight1

innat avatar Mar 30 '23 09:03 innat