blip2 topic

List blip2 repositories

Video-LLaMA

2.7k
Stars
242
Forks
Watchers

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

BLIVA

264
Stars
27
Forks
Watchers

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions

fashion_image_caption

43
Stars
7
Forks
Watchers

Automate Fashion Image Captioning using BLIP-2. Automatic generating descriptions of clothes on shopping websites, which can help customers without fashion knowledge to better understand the features...

PaddleMIX

708
Stars
223
Forks
708
Watchers

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...

chat-with-nerf

302
Stars
19
Forks
Watchers

Chat with NeRF enables users to interact with a NeRF model by typing in natural language.

ComCLIP

36
Stars
5
Forks
36
Watchers

Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"

qformer

29
Stars
0
Forks
Watchers

Implementation of Qformer from BLIP2 in Zeta Lego blocks.

MiniGPT-4-discord-bot

44
Stars
2
Forks
Watchers

A true multimodal LLaMA derivative -- on Discord!

Vision-Language-Models-Overview

444
Stars
22
Forks
444
Watchers

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

ai-powered-video-analyzer

40
Stars
14
Forks
40
Watchers

An offline AI-powered video analysis tool with object detection (YOLO), image captioning (BLIP), speech transcription (Whisper), audio event detection (PANNs), and AI-generated summaries (LLMs via Oll...