Multimodal datasets
This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and
Frontiers".
As a part of this release we share the information about recent multimodal datasets which are available for research purposes.
We found that although 100+ multimodal language resources are available in literature for various NLP tasks, still publicly available multimodal datasets are under-explored for its re-usage in subsequent problem domains.
Multimodal datasets for NLP Applications
-
Sentiment Analysis
Dataset |
Title of the Paper |
Link of the Paper |
Link of the Dataset |
EmoDB |
A Database of German Emotional Speech |
Paper |
Dataset |
VAM |
The Vera am Mittag German Audio-Visual Emotional Speech Database |
Paper |
Dataset |
IEMOCAP |
IEMOCAP: interactive emotional dyadic motion capture database |
Paper |
Dataset |
Mimicry |
A Multimodal Database for Mimicry Analysis |
Paper |
Dataset |
YouTube |
Towards Multimodal Sentiment Analysis:Harvesting Opinions from the Web |
Paper |
Dataset |
HUMAINE |
The HUMAINE database |
Paper |
Dataset |
Large Movies |
Sentiment classification on Large Movie Review |
Paper |
Dataset |
SEMAINE |
The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent |
Paper |
Dataset |
AFEW |
Collecting Large, Richly Annotated Facial-Expression Databases from Movies |
Paper |
Dataset |
SST |
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank |
Paper |
Dataset |
ICT-MMMO |
YouTube Movie Reviews: Sentiment Analysis in an AudioVisual Context |
Paper |
Dataset |
RECOLA |
Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions |
Paper |
Dataset |
MOUD |
Utterance-Level Multimodal Sentiment Analysis |
Paper |
|
CMU-MOSI |
MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos |
Paper |
Dataset |
POM |
Multimodal Analysis and Prediction of Persuasiveness in Online Social Multimedia |
Paper |
Dataset |
MELD |
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations |
Paper |
Dataset |
CMU-MOSEI |
Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph |
Paper |
Dataset |
AMMER |
Towards Multimodal Emotion Recognition in German Speech Events in Cars using Transfer Learning |
Paper |
On Request |
SEWA |
SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild |
Paper |
Dataset |
Fakeddit |
r/fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection |
Paper |
Dataset |
CMU-MOSEAS |
CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French |
Paper |
Dataset |
MultiOFF |
Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text |
Paper |
Dataset |
MEISD |
MEISD: A Multimodal Multi-Label Emotion, Intensity and Sentiment Dialogue Dataset for Emotion Recognition and Sentiment Analysis in Conversations |
Paper |
Dataset |
TASS |
Overview of TASS 2020: Introducing Emotion |
Paper |
Dataset |
CH SIMS |
CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality |
Paper |
Dataset |
Creep-Image |
A Multimodal Dataset of Images and Text |
Paper |
Dataset |
Entheos |
Entheos: A Multimodal Dataset for Studying Enthusiasm |
Paper |
Dataset |
-
Machine Translation
Dataset |
Title of the Paper |
Link of the Paper |
Link of the Dataset |
Multi30K |
Multi30K: Multilingual English-German Image Description |
Paper |
Dataset |
How2 |
How2: A Large-scale Dataset for Multimodal Language Understanding |
Paper |
Dataset |
MLT |
Multimodal Lexical Translation |
Paper |
Dataset |
IKEA |
A Visual Attention Grounding Neural Model for Multimodal Machine Translation |
Paper |
Dataset |
Flickr30K (EN- (hi-IN)) |
Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data |
Paper |
On Request |
Hindi Visual Genome |
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation |
Paper |
Dataset |
HowTo100M |
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models |
Paper |
Dataset |
-
Information Retrieval
Dataset |
Title of the Paper |
Link of the Paper |
Link of the Dataset |
MUSICLEF |
MusiCLEF: a Benchmark Activity in Multimodal Music Information Retrieval |
Paper |
Dataset |
Moodo |
The Moodo dataset: Integrating user context with emotional and color perception of music for affective music information retrieval |
Paper |
Dataset |
ALF-200k |
ALF-200k: Towards Extensive Multimodal Analyses of Music Tracks and Playlists |
Paper |
Dataset |
MQA |
Can Image Captioning Help Passage Retrieval in Multimodal Question Answering? |
Paper |
Dataset |
WAT2019 |
WAT2019: English-Hindi Translation on Hindi Visual Genome Dataset |
Paper |
Dataset |
ViTT |
Multimodal Pretraining for Dense Video Captioning |
Paper |
Dataset |
MTD |
MTD: A Multimodal Dataset of Musical Themes for MIR Research |
Paper |
Dataset |
MusiClef |
A professionally annotated and enriched multimodal data set on popular music |
Paper |
Dataset |
Schubert Winterreise |
Schubert Winterreise dataset: A multimodal scenario for music analysis |
Paper |
Dataset |
WIT |
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning |
Paper |
Dataset |
-
Question Answering
Dataset |
Title of the Paper |
Link of the Paper |
Link of the Dataset |
MQA |
A Dataset for Multimodal Question Answering in the Cultural Heritage Domain |
Paper |
- |
MovieQA |
Movieqa: Understanding stories in movies through question-answering MovieQA |
Paper |
Dataset |
PororoQA |
Deep story video story qa by deep embedded memory networks |
Paper |
Dataset |
MemexQA |
MemexQA: Visual Memex Question Answering |
Paper |
Dataset |
VQA |
Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering |
Paper |
Dataset |
TDIUC |
An analysis of visual question answering algorithms |
Paper |
Dataset |
TGIF-QA |
TGIF-QA: Toward spatio-temporal reasoning in visual question answering |
Paper |
Dataset |
MSVD QA, MSRVTT QA |
Video question answering via attribute augmented attention network learning |
Paper |
Dataset |
YouTube2Text |
Video Question Answering via Gradually Refined Attention over Appearance and Motion |
Paper |
Dataset |
MovieFIB |
A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering |
Paper |
Dataset |
Video Context QA |
Uncovering the temporal context for video question answering |
Paper |
Dataset |
MarioQA |
Marioqa: Answering questions by watching gameplay videos |
Paper |
Dataset |
TVQA |
Tvqa: Localized, compositional video question answering |
Paper |
Dataset |
VQA-CP v2 |
Don’t just assume; look and answer: Overcoming priors for visual question answering |
Paper |
Dataset |
RecipeQA |
RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes |
Paper |
Dataset |
GQA |
GQA: A new dataset for real-world visual reasoning and compositional question answering |
Paper |
Dataset |
Social IQ |
Social-iq: A question answering benchmark for artificial social intelligence |
Paper |
Dataset |
MIMOQA |
MIMOQA: Multimodal Input Multimodal Output Question Answering |
Paper |
- |
-
Summarization
Dataset |
Title of the Paper |
Link of the Paper |
Link of the Dataset |
SumMe |
Tvsum: Summarizing web videos using titles |
Paper |
Dataset |
TVSum |
Creating summaries from user videos |
Paper |
Dataset |
QFVS |
Query-focused video summarization: Dataset, evaluation, and a memory network based approach |
Paper |
Dataset |
MMSS |
Multi-modal Sentence Summarization with Modality Attention and Image Filtering |
Paper |
- |
MSMO |
MSMO: Multimodal Summarization with Multimodal Output |
Paper |
- |
Screen2Words |
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning |
Paper |
Dataset |
AVIATE |
IEMOCAP: interactive emotional dyadic motion capture database |
Paper |
Dataset |
Multimodal Microblog Summarizaion |
On Multimodal Microblog Summarization |
Paper |
- |
-
Human Computer Interaction
Dataset |
Title of the Paper |
Link of the Paper |
Link of the Dataset |
CAUVE |
CUAVE: A new audio-visual database for multimodal human-computer interface research |
Paper |
Dataset |
MHAD |
Berkeley mhad: A comprehensive multimodal human action database |
Paper |
Dataset |
Multi-party interactions |
A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction |
Paper |
- |
MHHRI |
Multimodal human-human-robot interactions (mhhri) dataset for studying personality and engagement |
Paper |
[Dataset]( https://www.cl.cam.ac.uk/research/rainbow/projects/mhhri/) |
Red Hen Lab |
Red Hen Lab: Dataset and Tools for Multimodal Human Communication Research |
Paper |
- |
EMRE |
Generating a Novel Dataset of Multimodal Referring Expressions |
Paper |
Dataset |
Chinese Whispers |
Chinese whispers: A multimodal dataset for embodied language grounding |
Paper |
Dataset |
uulmMAC |
The uulmMAC database—A multimodal affective corpus for affective computing in human-computer interaction |
Paper |
Dataset |
-
Semantic Analysis
Dataset |
Title of the Paper |
Link of the Paper |
Link of the Dataset |
WN9-IMG |
Image-embodied Knowledge Representation Learning |
Paper |
Dataset |
Wikimedia Commons |
A Dataset and Reranking Method for Multimodal MT of User-Generated Image Captions |
Paper |
Dataset |
Starsem18-multimodalKB |
A Multimodal Translation-Based Approach for Knowledge Graph Representation Learning |
Paper |
[Dataset]( https://github.com/UKPLab/starsem18-multimodalKB) |
MUStARD |
Towards Multimodal Sarcasm Detection |
Paper |
Dataset |
YouMakeup |
YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension |
Paper |
[Dataset]( https://github.com/AIM3-RUC/YouMakeup) |
MDID |
Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts |
Paper |
[Dataset]( https://github.com/karansikka1/documentIntent_emnlp19) |
Social media posts from Flickr (Mental Health) |
Inferring Social Media Users’ Mental Health Status from Multimodal Information |
Paper |
Dataset |
Twitter MEL |
Building a Multimodal Entity Linking Dataset From Tweets Building a Multimodal Entity Linking Dataset From Tweets |
Paper |
[Dataset]( https://github.com/OA256864/MEL_Tweets) |
MultiMET |
MultiMET: A Multimodal Dataset for Metaphor Understanding |
Paper |
- |
MSDS |
Multimodal Sarcasm Detection in Spanish: a Dataset and a Baseline |
Paper |
Dataset |
-
Miscellaneous
Dataset |
Title of the Paper |
Link of the Paper |
Link of the Dataset |
MS COCO |
Microsoft COCO: Common objects in context |
Paper |
Dataset |
ILSVRC |
ImageNet Large Scale Visual Recognition Challenge |
Paper |
Dataset |
YFCC100M |
YFCC100M: The new data in multimedia research |
Paper |
Dataset |
COGNIMUSE |
COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization |
Paper |
Dataset |
SNAG |
SNAG: Spoken Narratives and Gaze Dataset |
Paper |
Dataset |
UR-Funny |
UR-FUNNY: A Multimodal Language Dataset for Understanding Humor |
Paper |
Dataset |
Bag-of-Lies |
Bag-of-Lies: A Multimodal Dataset for Deception Detection |
Paper |
Dataset |
MARC |
A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks |
Paper |
Dataset |
MuSE |
MuSE: a Multimodal Dataset of Stressed Emotion |
Paper |
Dataset |
BabelPic |
Fatality Killed the Cat or: BabelPic, a Multimodal Dataset for Non-Concrete Concept |
Paper |
Dataset |
Eye4Ref |
Eye4Ref: A Multimodal Eye Movement Dataset of Referentially Complex Situations |
Paper |
- |
Troll Memes |
A Dataset for Troll Classification of TamilMemes |
Paper |
Dataset |
SEMD |
EmoSen: Generating sentiment and emotion controlled responses in a multimodal dialogue system |
Paper |
- |
Chat talk Corpus |
Construction and Analysis of a Multimodal Chat-talk Corpus for Dialog Systems Considering Interpersonal Closeness |
Paper |
- |
EMOTyDA |
Towards Emotion-aided Multi-modal Dialogue Act Classification |
Paper |
Dataset |
MELINDA |
MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification |
Paper |
Dataset |
NewsCLIPpings |
NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media |
Paper |
Dataset |
R2VQ |
Designing Multimodal Datasets for NLP Challenges |
Paper |
Dataset |
M2H2 |
M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in Conversations |
Paper |
Dataset |