vision-language topics

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

AlibabaResearch

artificial-intelligence

computer-vision

document

document-analysis

Awesome-Controllable-Diffusion

495

Stars

32

Forks

495

Watchers

Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.

atfortes

awesome

chatgpt

cot

flamingo

Open-GroundingDino

254

Stars

41

Forks

Watchers

This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.

longzw1997

object-detection

open-world

open-world-detection

vision-language

d-cube

104

Stars

7

Forks

Watchers

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

shikras

dataset

multi-modal-learning

object-detection

open-vocabulary-detection

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for...

mbzuai-oryx

chatbot

clip

gpt-4

llama

ViP-LLaVA

292

Stars

22

Forks

Watchers

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

WisconsinAIVision

chatbot

clip

foundation-models

gpt-4

Awesome-Multimodal-Chatbot

63

Stars

6

Forks

Watchers

Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction, such as text, speech, images, and videos, to provide a seamle...

zjr2000

awesome

chat-application

chatbot

general-ai