vision-and-language topics

pytorch_ldast

23

Stars

1

Forks

Watchers

A PyTorch implementation of LDAST

tsujuifu

artistic-style-transfer

eccv2022

image-editing

pytorch

VLDet

179

Stars

11

Forks

Watchers

[ICLR 2023] PyTorch implementation of VLDet （https://arxiv.org/abs/2211.14843）

clin1223

iclr2023

multi-modal

object-detection

open-vocabulary

LRV-Instruction

249

Stars

13

Forks

Watchers

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

FuxiaoLiu

chatgpt

evaluation

evaluation-metrics

foundation-models

Perceiver_VL

32

Stars

3

Forks

Watchers

PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)

zinengtang

efficiency

retrieval

scalability

video-language

VisualNews-Repository

83

Stars

9

Forks

Watchers

[EMNLP'21] Visual News: Benchmark and Challenges in News Image Captioning

FuxiaoLiu

alignment

benchmark

dataset

datasets

sugar-crepe

59

Stars

7

Forks

Watchers

[NeurIPS 2023] A faithful benchmark for vision-language compositionality

RAIVNLab

benchmark

deep-learning

multi-modal-learning

pytorch

PointLLM

538

Stars

24

Forks

Watchers

[ECCV 2024 Oral] PointLLM: Empowering Large Language Models to Understand Point Clouds

OpenRobotLab

3d

chatbot

foundation-models

gpt-4

LLaVAR

254

Stars

12

Forks

Watchers

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

SALT-NLP

chatbot

chatgpt

gpt-4

instruction-tuning

VidSitu

56

Stars

8

Forks

Watchers

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

TheShadow29

captioning

captioning-videos

event-relations

grounding

hulc2

30

Stars

2

Forks

Watchers

[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data

mees

computer-vision

deep-learning

grounding

manipulation