vision-and-language topics

hateful_memes-hate_detectron

53

Stars

19

Forks

Watchers

Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975

rizavelioglu

challenge

hateful-memes

hateful-memes-challenge

multimodal-deep-learning

VLCAP

28

Stars

5

Forks

Watchers

[ICIP 2022] VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning

UARK-AICV

contrastive-learning

transformer

video-captioning

vision-and-language

open-fashion-clip

48

Stars

5

Forks

Watchers

This is the official repository for the paper "OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data". ICIAP 2023

aimagelab

clip

contrastive-learning

fashionai

vision-and-language

MAC

23

Stars

0

Forks

Watchers

An end-to-end masked contrastive video-and-language pre-training framework

shufangxun

activitynet

clip

contrastive-learning

didemo

DRFT

18

Stars

0

Forks

Watchers

End-to-end Multi-modal Video Temporal Grounding, NeurIPS 2021

wenz116

computer-vision

contrastive-learning

deep-learning

pytorch

awesome-vision-language-models-for-earth-observation

155

Stars

12

Forks

Watchers

A curated list of awesome vision and language resources for earth observation.

geoaigroup

awesome

awesome-list

earth-observation

multimodal-deep-learning

RS5M

168

Stars

6

Forks

Watchers

RS5M: a large-scale vision language dataset for remote sensing

om-ai-lab

foundation-models

remote-sensing

vision-and-language

Cross-Modal-Adapter

51

Stars

2

Forks

Watchers

[arXiv] Cross-Modal Adapter for Text-Video Retrieval

LeapLabTHU

adapter

clip

deep-learning

machine-learning

VidChapters

175

Stars

20

Forks

Watchers

[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale

antoyang

dense-video-captioning

multimodal-learning

pre-training

temporal-language-grounding

EDA

96

Stars

4

Forks

Watchers

[CVPR 2023] EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding

yanmin-wu

3d-vision-and-language

3d-visual-grounding

vision-and-language

visual-grounding