Google Research Datasets
Google Research Datasets
relation-extraction-corpus
Automatically exported from code.google.com/p/relation-extraction-corpus
RxR
Room-across-Room (RxR) is a large-scale, multilingual dataset for Vision-and-Language Navigation (VLN) in Matterport3D environments. It contains 126k navigation instructions in English, Hindi and Telu...
screen2words
The dataset includes screen summaries that describes Android app screenshot's functionalities. It is used for training and evaluation of the screen2words models (our paper accepted by UIST'21 will be...
sentence-compression
Large corpus of uncompressed and compressed sentences from news articles.
synthetic-fur
A procedurally generated synthetic fur dataset with conditional inputs for machine learning and neural rendering.
Taskmaster
Please see the readme file as well as our 2019 EMNLP paper linked here -->
TF-IDF-IIF-top100-wordlists
These are lists for a variety of languages containing words that are distinctive to each language.
ToTTo
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, prod...