vision-language topic

List vision-language repositories

hulc

58
Stars
9
Forks
Watchers

Hierarchical Universal Language Conditioned Policies

marqo

4.5k
Stars
185
Forks
Watchers

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

GroundingDINO

5.3k
Stars
557
Forks
Watchers

Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

AdvancedLiterateMachinery

1.4k
Stars
164
Forks
Watchers

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Awesome-Controllable-Diffusion

495
Stars
32
Forks
495
Watchers

Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.

Open-GroundingDino

254
Stars
41
Forks
Watchers

This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.

d-cube

104
Stars
7
Forks
Watchers

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

Video-ChatGPT

1.2k
Stars
102
Forks
Watchers

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for...

ViP-LLaVA

292
Stars
22
Forks
Watchers

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Awesome-Multimodal-Chatbot

63
Stars
6
Forks
Watchers

Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction, such as text, speech, images, and videos, to provide a seamle...