vision-language-model topic

List vision-language-model repositories

prismer

1.3k
Stars
74
Forks
9
Watchers

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

LLaVA

17.1k
Stars
1.8k
Forks
135
Watchers

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

AdvancedLiterateMachinery

1.1k
Stars
133
Forks
Watchers

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Awesome-Controllable-Generation

289
Stars
17
Forks
Watchers

Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, T2I-Adapter, IP-Adapter.

VLM_survey

1.8k
Stars
166
Forks
Watchers

Collection of AWESOME vision-language models for vision tasks

multimodal-maestro

963
Stars
68
Forks
Watchers

Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

Multi-Modality-Arena

387
Stars
26
Forks
Watchers

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...

vllm-safety-benchmark

44
Stars
1
Forks
Watchers

Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"

VoxPoser

410
Stars
52
Forks
Watchers

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models