vision-language-model topic

List vision-language-model repositories

prismer

1.3k
Stars
75
Forks
Watchers

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

LLaVA

19.5k
Stars
2.1k
Forks
Watchers

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

AdvancedLiterateMachinery

1.4k
Stars
164
Forks
Watchers

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Awesome-Controllable-Diffusion

495
Stars
32
Forks
495
Watchers

Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.

VLM_survey

2.4k
Stars
214
Forks
Watchers

Collection of AWESOME vision-language models for vision tasks

maestro

2.6k
Stars
219
Forks
2.6k
Watchers

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Multi-Modality-Arena

450
Stars
34
Forks
Watchers

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...

vllm-safety-benchmark

63
Stars
2
Forks
Watchers

[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"

VoxPoser

528
Stars
73
Forks
Watchers

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models