biomedical icon indicating copy to clipboard operation
biomedical copied to clipboard

Closes #427

Open nomisto opened this issue 2 years ago • 4 comments

Closes #427

Dataset contains 8 different subset_id's (different dataset settings), each with a bigbio and source schema.

Furthermore there is an subset called mediqa_ans_all which includes all data (articles, sections, URLs of documents, all four different kinds of summaries, ...). I did not implement a bigbio schema for the all view as I think this does not make sense here. Since the bigbio schema is missing for all tests fail for subset mediqa_ans_all.

Tests:

python -m tests.test_bigbio biodatasets/mediqa_ans/mediqa_ans.py --subset_id mediqa_ans_all
python -m tests.test_bigbio biodatasets/mediqa_ans/mediqa_ans.py --subset_id mediqa_ans_page2answer_multi_abstractive
python -m tests.test_bigbio biodatasets/mediqa_ans/mediqa_ans.py --subset_id mediqa_ans_page2answer_multi_extractive
python -m tests.test_bigbio biodatasets/mediqa_ans/mediqa_ans.py --subset_id mediqa_ans_page2answer_single_abstractive
python -m tests.test_bigbio biodatasets/mediqa_ans/mediqa_ans.py --subset_id mediqa_ans_page2answer_single_extractive
python -m tests.test_bigbio biodatasets/mediqa_ans/mediqa_ans.py --subset_id mediqa_ans_section2answer_multi_abstractive
python -m tests.test_bigbio biodatasets/mediqa_ans/mediqa_ans.py --subset_id mediqa_ans_section2answer_multi_extractive
python -m tests.test_bigbio biodatasets/mediqa_ans/mediqa_ans.py --subset_id mediqa_ans_section2answer_single_abstractive
python -m tests.test_bigbio biodatasets/mediqa_ans/mediqa_ans.py --subset_id mediqa_ans_section2answer_single_extractive

nomisto avatar Apr 12 '22 08:04 nomisto

Hi @sunnnymskang , Sure, I've added a description to the value of _DESCRIPTION and the docstring.

nomisto avatar Apr 26 '22 07:04 nomisto

@nomisto Can you remind me why this fits the t2t schema better than question answering? We want to merge this PR asap; it looks mostly ok.

hakunanatasha avatar Apr 27 '22 04:04 hakunanatasha

Hi @hakunanatasha , the name of this dataset is a little misleading: It is a summarization task, more specifically an answer summarization task. So the input is question + answer and the task is to generate a summarization of that answer.

nomisto avatar Apr 27 '22 06:04 nomisto

@nomisto got it; I'll merge this later today. Sorry for the hold up. I assume since it's a summarization, the text-1/2-name are also blank as there is nothing to update here.

hakunanatasha avatar Apr 27 '22 15:04 hakunanatasha