Google Research Datasets

Results 70 repositories owned by


                                            Google Research Datasets

trafficstars

wiki-atomic-edits

104

Stars

Forks

Watchers

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

google-research-datasets

deep-learning

deep-neural-networks

nlp

nlp-machine-learning

TextNormalizationCoveringGrammars

Stars

Forks

Watchers

Covering grammars for English and Russian text normalization

google-research-datasets

nlp

speech-recognition

text-to-speech

bam

Stars

Forks

Watchers

google-research-datasets

boolean-questions

129

Stars

Forks

Watchers

google-research-datasets

C4_200M-synthetic-dataset-for-grammatical-error-correction

148

Stars

Forks

Watchers

This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences from C4 using a tagged corruption model. The approach and the data...

google-research-datasets

ccpe

Stars

Forks

Watchers

A dataset consisting of 502 English dialogs with 12,000 annotated utterances between a user and an assistant discussing movie preferences in natural language. It was collected using a Wizard-of-Oz met...

google-research-datasets

circa

Stars

Forks

Watchers

Circa (meaning ‘approximately’) dataset aims to help machine learning systems to solve the problem of interpreting indirect answers to polar questions. The dataset contains pairs of yes/no questions a...

google-research-datasets

clang8

Stars

Forks

Watchers

cLang-8 is a dataset for grammatical error correction.

google-research-datasets

clay

Stars

Forks

Watchers

The dataset includes UI object type labels (e.g., BUTTON, IMAGE, CHECKBOX) that describes the semantic type of an UI object on Android app screenshots. It is used for training and evaluation of the sc...

google-research-datasets

Crisscrossed-Captions

Stars

Forks

Watchers

Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO

google-research-datasets