multimodal-deep-learning topic

List multimodal-deep-learning repositories

Taris

25
Stars
6
Forks
Watchers

Transformer-based online speech recognition system with TensorFlow 2

FVTA_MemexQA

33
Stars
15
Forks
Watchers

Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19

scarches

314
Stars
49
Forks
Watchers

Reference mapping for single-cell genomics

awesome-grounding

973
Stars
95
Forks
Watchers

awesome grounding: A curated list of research papers in visual grounding

awesome-vision-language-pretraining-papers

1.1k
Stars
101
Forks
Watchers

Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

video-captioning

165
Stars
66
Forks
Watchers

This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the...

multimodal-deep-learning

662
Stars
141
Forks
Watchers

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

blended-latent-diffusion

519
Stars
32
Forks
Watchers

Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]