Google Research Datasets

Results 70 repositories owned by Google Research Datasets

answer-equivalence-dataset

20
Stars
3
Forks
Watchers

This dataset contains human judgements about answer equivalence. The data is based on SQuAD (Stanford Question Answering Dataset), and contains 9k human judgements of answer candidates generated by Al...

Attributed-QA

34
Stars
11
Forks
Watchers

We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in information-seeking scenarios. This release consists of human-r...

dices-dataset

18
Stars
1
Forks
Watchers

This repository contains two datasets with multi-turn adversarial conversations generated by human agents interacting with a dialog model and rated for safety by two corresponding diverse rater pools.

GSM-IC

41
Stars
0
Forks
Watchers

Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant sentences in problem descriptions. GSM-IC is constructed to evalu...

Hinglish-TOP-Dataset

29
Stars
6
Forks
Watchers

Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentation technique. Queries are derived from TOPv2, a multi-domain ta...

presto

106
Stars
5
Forks
Watchers

A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs

QAmeleon

32
Stars
5
Forks
Watchers

QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning PaLM with only five examples per language. We use the synthetic...

screen_qa

15
Stars
1
Forks
Watchers

ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K sc...

seahorse

82
Stars
7
Forks
Watchers

Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 quality dimensions: comprehensibility, repetition, grammar, attr...