Google Research Datasets
Google Research Datasets
MAVE
The dataset contains 3 million attribute-value annotations across 1257 unique categories on 2.2 million cleaned Amazon product profiles. It is a large, multi-sourced, diverse dataset for product attri...
MultiReQA
We are creating a challenging new benchmark MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models. Retrieval question answering (ReQA) is the task of retrieving a sentence-level...
natural-questions
Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question answer...
NewSHead
The NewSHead dataset is a multi-doc headline dataset used in NHNet for training a headline summarization model.
NewsQuizQA
NewsQuizQA is a quiz-style question-answer dataset used for generating quiz questions about the news
noun-verb
This dataset contains naturally-occurring English sentences that feature non-trivial noun-verb ambiguity.
Nutrition5k
Detailed visual + nutritional data for over 5,000 plates of food.
paws
This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, and word order information for the problem of paraphrase identifi...