multimodal-deep-learning topic
Awesome-3D-Vision-and-Language
A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.
attentive-modality-hopping-for-SER
TensorFlow implementation of "Attentive Modality Hopping for Speech Emotion Recognition," ICASSP-20
Multimodal-Learning
This repository contains the source code for the paper "Improving the performance of unimodal dynamic hand gesture recognition with multimodal training"
awesome-visual-dialog
Recent Advances in Visual Dialog
mmae
Package for Multimodal Autoencoders in TensorFlow / Keras
DeepCU-IJCAI19
DeepCU: Integrating Both Common and Unique Latent Information for Multimodal Sentiment Analysis, IJCAI-19
SUTD-TrafficQA
[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
hateful_memes-hate_detectron
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975
visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.