Google Research Datasets

Results 70 repositories owned by Google Research Datasets

turkish-treebanks

27
Stars
2
Forks
Watchers

A human-annotated morphosyntactic treebank for Turkish.

tydiqa

238
Stars
34
Forks
Watchers

TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and without the use of translation, and is designed for the trainin...

uibert

17
Stars
4
Forks
Watchers

It includes two datasets that are used in the downstream tasks for evaluating UIBert: App Similar Element Retrieval data and Visual Item Selection (VIS) data. Both datasets are written TFRecords.

uninum

62
Stars
18
Forks
Watchers

A database of number names for 186 languages, locales, and scripts

Video-Timeline-Tags-ViTT

15
Stars
1
Forks
Watchers

A collection of videos annotated with timelines where each video is divided into segments, and each segment is labelled with a short free-text description

WebRED

21
Stars
4
Forks
Watchers

WebRED is a large and diverse manually annotated dataset for extracting relationships from a variety of text found on the World Wide Web.

wiki-links

42
Stars
9
Forks
Watchers

Automatically exported from code.google.com/p/wiki-links

wiki-reading

266
Stars
32
Forks
Watchers

This repository contains the three WikiReading datasets as used and described in WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia, Hewlett, et al, ACL 2016 (the English Wiki...

swim-ir

41
Stars
2
Forks
Watchers

SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask promptin...

AIS

25
Stars
1
Forks
Watchers

AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external world that is verifiable in source documents, or "Attributable t...