blip2 topic
Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BLIVA
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
fashion_image_caption
Automate Fashion Image Captioning using BLIP-2. Automatic generating descriptions of clothes on shopping websites, which can help customers without fashion knowledge to better understand the features...
PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...
chat-with-nerf
Chat with NeRF enables users to interact with a NeRF model by typing in natural language.
ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
qformer
Implementation of Qformer from BLIP2 in Zeta Lego blocks.
MiniGPT-4-discord-bot
A true multimodal LLaMA derivative -- on Discord!