Google Research Datasets

Results 70 repositories owned by


                                            Google Research Datasets

wit

961

Stars

Forks

Watchers

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

google-research-datasets

cc-by-sa-3

machine-learning

multilingual

multimodal

dstc8-schema-guided-dialogue

525

Stars

122

Forks

Watchers

The Schema-Guided Dialogue Dataset

google-research-datasets

assistant

dataset

dialogue

dialogue-systems

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera...

google-research-datasets

3d-reconstruction

3d-vision

conceptual-12m

327

Stars

Forks

Watchers

Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.

google-research-datasets

multimodal-dataset

pre-training

vision-and-language

coarse-discourse

237

Stars

Forks

Watchers

A large corpus of discourse annotations and relations on ~10K forum threads.

google-research-datasets

conceptual-captions

482

Stars

Forks

Watchers

Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.

google-research-datasets

cvss

167

Stars

Forks

Watchers

CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus

google-research-datasets

dakshina

178

Stars

Forks

Watchers

The Dakshina dataset is a collection of text in both Latin and native scripts for 12 South Asian languages. For each language, the dataset includes a large collection of native script Wikipedia text,...

google-research-datasets