multimodal-deep-learning topic
Taris
Transformer-based online speech recognition system with TensorFlow 2
FVTA_MemexQA
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
densecap-pytorch
A simplified pytorch version of densecap
awesome-emotion-recognition-in-conversations
A comprehensive reading list for Emotion Recognition in Conversations
scarches
Reference mapping for single-cell genomics
awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
awesome-vision-language-pretraining-papers
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
video-captioning
This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the...
multimodal-deep-learning
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
blended-latent-diffusion
Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]