keras-nlp Add BLIP model

Add BLIP model

Open innat opened this issue 2 years ago • 0 comments

Is your feature request related to a problem? Please describe.

BLIP: Bootstrapping Language-Image Pre-training (2022 ), is a model that is able to perform various multi-modal tasks including

Image Captioning
Visual Question Answering
Image-Text retrieval (Image-text matching)

(Cited by 247, until now)

relevant discussion

Describe the solution you'd like

Info: I've went through the source code of official BLIP repo, to its image-captioning mode, and found that most of the codes are taken from huggingface-transformer and modified with their proposed solutions. The nlp component of BLIP is BERT in most part. Shortly

As HF also provides TF-BERT as well, its straightforward to translate code but with KerasNLP-BERT, it might need extra care.

Describe alternatives you've considered

none.

Additional context

For BLIP, the CV component is only Vision Transformer for feature extraction. KerasCV provides the vit model. The larger part of BLIP model consist of NLP component (specifically BERT).
BLIP 2 - HF-Blog

update

TF version of BLIP is added to Huggingface-Transformer ❤️ cc. @Rocketknight1

Mar 30 '23 09:03 innat

keras-nlp keras-nlp copied to clipboard

Add BLIP model

keras-nlp
keras-nlp copied to clipboard