t-zero Missing dataset config names in P3

Missing dataset config names in P3

Open ari9dam opened this issue 1 year ago • 2 comments

The readme says that For reproducibility, we have released an already cached version of the data(https://huggingface.co/datasets/bigscience/P3), which means you don't need to cache the data yourself. The only exception is Story Cloze (https://cs.rochester.edu/nlp/rocstories/). However upon inspecting I see that there are 92 config names that are missing in P3 but present in the seqio.mixture. Here is my code to find out the missing dataset config names:

import datasets
import seqio
import t0.seqio_tasks

ds = datasets.get_dataset_config_names("bigscience/P3")

t0_train_task_names = [task.name for task in seqio.MixtureRegistry.get("t0_train").tasks]
t0p_train_task_names = [task.name for task in seqio.MixtureRegistry.get("t0+_train").tasks]
t0pp_train_task_names = [task.name for task in seqio.MixtureRegistry.get("t0++_train").tasks]
t0_eval_score_eval = [task.name for task in seqio.MixtureRegistry.get("t0_eval_score_eval").tasks]
t0_train_score_eval = [task.name for task in seqio.MixtureRegistry.get("t0_train_score_eval").tasks]

all_tasks = set()
for tasks in [t0_train_task_names, t0p_train_task_names, t0pp_train_task_names, t0_eval_score_eval, t0_train_score_eval]:
    all_tasks.update(tasks)

missing = []
for t in all_tasks:
    if t not in ds:
        missing.append(t)
print(missing)

The missing pieces are: wiki_qa_found_on_google_score_eval quartz_having_read_above_passage_score_eval ag_news_classify_score_eval cosmos_qa_no_prompt_text_score_eval glue_qqp_duplicate_or_not_score_eval wiki_qa_Decide_good_answer_score_eval cosmos_qa_context_description_question_answer_text_score_eval rotten_tomatoes_Writer_Expressed_Sentiment_score_eval qasc_qa_with_separated_facts_1_score_eval sciq_Direct_Question_score_eval qasc_qa_with_separated_facts_2_score_eval quartz_paragraph_question_plain_concat_score_eval cosmos_qa_no_prompt_id_score_eval story_cloze_2016_Novel_Correct_Ending_score_eval wiki_qa_Is_This_True__score_eval quail_context_question_answer_description_id_score_eval quartz_given_the_fact_answer_the_q_score_eval sciq_Multiple_Choice_Question_First_score_eval rotten_tomatoes_Reviewer_Expressed_Sentiment_score_eval quail_context_question_description_answer_id_score_eval qasc_qa_with_separated_facts_3_score_eval yelp_review_full_format_rating_score_eval quail_context_question_answer_description_text_score_eval story_cloze_2016_Story_Continuation_and_Options_score_eval yelp_review_full_on_a_scale_score_eval rotten_tomatoes_Movie_Expressed_Sentiment_2_score_eval yelp_review_full_this_place_score_eval ag_news_which_section_choices_score_eval cos_e_v1.11_question_option_description_text_score_eval glue_mrpc_want_to_know_score_eval cos_e_v1.11_description_question_option_text_score_eval wiki_qa_exercise_score_eval cos_e_v1.11_question_description_option_text_score_eval qasc_qa_with_separated_facts_4_score_eval glue_mrpc_replace_score_eval quail_description_context_question_answer_text_score_eval glue_qqp_meaning_score_eval quail_context_description_question_answer_text_score_eval glue_mrpc_same_thing_score_eval rotten_tomatoes_Reviewer_Opinion_bad_good_choices_score_eval cosmos_qa_context_question_description_answer_text_score_eval quail_no_prompt_id_score_eval cos_e_v1.11_description_question_option_id_score_eval cos_e_v1.11_question_description_option_id_score_eval qasc_qa_with_separated_facts_5_score_eval cosmos_qa_context_description_question_text_score_eval yelp_review_full_format_score_score_eval ag_news_classify_with_choices_score_eval cosmos_qa_context_description_question_answer_id_score_eval quartz_use_info_from_paragraph_question_score_eval quail_description_context_question_answer_id_score_eval cosmos_qa_description_context_question_text_score_eval glue_mrpc_equivalent_score_eval yelp_review_full_based_on_that_score_eval social_i_qa_Show_choices_and_generate_index_score_eval quail_no_prompt_text_score_eval ag_news_which_section_score_eval rotten_tomatoes_Sentiment_with_choices__score_eval quartz_answer_question_below_score_eval social_i_qa_Show_choices_and_generate_answer_score_eval glue_qqp_duplicate_score_eval ag_news_recommend_score_eval story_cloze_2016_Answer_Given_options_score_eval cosmos_qa_description_context_question_answer_text_score_eval ag_news_classify_with_choices_question_first_score_eval quartz_read_passage_below_choose_score_eval quail_context_question_description_answer_text_score_eval wiki_qa_automatic_system_score_eval sciq_Multiple_Choice_score_eval story_cloze_2016_Movie_What_Happens_Next_score_eval quartz_use_info_from_question_paragraph_score_eval yelp_review_full_format_star_score_eval sciq_Direct_Question_Closed_Book__score_eval social_i_qa_I_was_wondering_score_eval ag_news_classify_question_first_score_eval glue_qqp_quora_score_eval yelp_review_full_so_i_would_score_eval story_cloze_2016_Choose_Story_Ending_score_eval quartz_answer_question_based_on_score_eval cos_e_v1.11_question_option_description_id_score_eval social_i_qa_Generate_answer_score_eval rotten_tomatoes_Reviewer_Enjoyment_score_eval rotten_tomatoes_Movie_Expressed_Sentiment_score_eval glue_qqp_same_thing_score_eval quail_context_description_question_answer_id_score_eval rotten_tomatoes_Text_Expressed_Sentiment_score_eval cosmos_qa_description_context_question_answer_id_score_eval rotten_tomatoes_Reviewer_Enjoyment_Yes_No_score_eval rotten_tomatoes_Reviewer_Sentiment_Feeling_score_eval glue_mrpc_paraphrase_score_eval cosmos_qa_context_question_description_answer_id_score_eval social_i_qa_Check_if_a_random_answer_is_valid_or_not_score_eval Could you please shade some light? I'm trying to reproduce (close enough is good) the T0 Training in pytorch. Thank you.

Jul 12 '22 01:07 ari9dam

It seems that the missing datasets are from the mix t0_eval_score_eval and t0_train_score_eval.

Jul 12 '22 03:07 ari9dam

Hi @ari9dam , thanks for the ping!

Except for the 5 of story cloze (which we won't release because it's behind an agreement), all the 87 missing are the ones in t0_train_score_eval which is the evaluation sets from the training datasets. We use the evaluation set to perform checkpoint selection. I didn't think someone would ever need them, I am happy to include them into bigscience/P3 too, though i won't have the bandwidth to do so before next week.

However, I have them cached somewhere if you need them. Could you email me?

Jul 12 '22 08:07 VictorSanh

t-zero t-zero copied to clipboard

Missing dataset config names in P3

t-zero
t-zero copied to clipboard