biomedical icon indicating copy to clipboard operation
biomedical copied to clipboard

Closes #502

Open nomisto opened this issue 2 years ago • 1 comments

Closes #502

This is a QnA dataset that supports two languages en and es, so there are two subsets containing the same questions: head_qa_en and head_qa_es. I implemented also a translation T2T config in de3f664 (translation is not the intended purpose of this dataset) with subset_id head_qa.

So there are finally 6 configs:

  • head_qa_en_source, head_qa_en_bigbio_qa
  • head_qa_es_source, head_qa_es_bigbio_qa
  • head_qa_source (Merge of head_qa_en_source and head_qa_es_source), head_qa_bigbio_t2t

However I get the following errors when running the test with f.e.:

(venv) PS C:\Users\Simon\biomedical> python -m tests.test_bigbio biodatasets/head_qa/head_qa.py --subset_id head_qa_en

...

head_qa_en_bigbio_t2t not found. Available: ['head_qa_source', 'head_qa_en_source', 'head_qa_es_source', 'head_qa_bigbio_t2t', 'head_qa_en_bigbio_qa', 'head_qa_es_bigbio_qa']

How should I proceed? There cannot be a "head_qa_en_bigbio_t2t" since t2t is not language specific.

nomisto avatar Apr 22 '22 11:04 nomisto

I've now removed the t2t schema so that the dataset could be merged anytime. If the dataset should contain the (non-native) translation task and there is a solution to my config problem I can readd it.

nomisto avatar Apr 27 '22 13:04 nomisto