Google Research Datasets

Results 70 repositories owned by Google Research Datasets
trafficstars

wiki-atomic-edits

104
Stars
8
Forks
Watchers

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

TextNormalizationCoveringGrammars

60
Stars
15
Forks
Watchers

Covering grammars for English and Russian text normalization

bam

47
Stars
8
Forks
Watchers

This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences from C4 using a tagged corruption model. The approach and the data...

ccpe

23
Stars
4
Forks
Watchers

A dataset consisting of 502 English dialogs with 12,000 annotated utterances between a user and an assistant discussing movie preferences in natural language. It was collected using a Wizard-of-Oz met...

circa

19
Stars
3
Forks
Watchers

Circa (meaning ‘approximately’) dataset aims to help machine learning systems to solve the problem of interpreting indirect answers to polar questions. The dataset contains pairs of yes/no questions a...

clang8

89
Stars
5
Forks
Watchers

cLang-8 is a dataset for grammatical error correction.

clay

32
Stars
3
Forks
Watchers

The dataset includes UI object type labels (e.g., BUTTON, IMAGE, CHECKBOX) that describes the semantic type of an UI object on Android app screenshots. It is used for training and evaluation of the sc...

Crisscrossed-Captions

47
Stars
3
Forks
Watchers

Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO