biomedical
biomedical copied to clipboard
Create dataset loader for TREC-2017 LiveQA
Adding a Dataset
- Name: TREC-2017 LiveQA
- Description: None provided
- Task: QA
- Paper: https://trec.nist.gov/pubs/trec26/papers/Overview-QA.pdf
- Data: https://github.com/abachaa/LiveQA_MedicalTask_TREC2017
- License: ?
#self-assign
Hi @luou-wen can you let us know if you are still working on this so we can update our project board? Please just notify us the status by Friday April 8. You can response to this comment or ping us on Slack or Discord.
No worries if you are not finished but still intend to work on this!
Hi @jason-fries sorry for the late response. I did not see this message until today. If possible, may I pick this back up? I was intending to finish the dataloader and make a pull request today.
Hi @luou-wen yes of course! I just re-assigned you.
@hakunanatasha Thank you very much! I will continue working on it and make a pull request asap.
Hi @luou-wen, Just a ping on the status of this dataset. Please let us know if you are still working on it and when you plan to submit a PR. Thanks!!
Hi @jason-fries, Apologies for the delay. I am still working on it, and I will submit a PR by this Sunday at the latest.
#self-assign
@hakunanatasha @jason-fries I have a couple of questions about this dataset:
- This dataset has multiple answers for the same question. The
bigbio_qa
schema has one answer per question. Should I create multiple uids for same questions with different answers? - For QA tasks, it seems like they are framed as
(question, context, answer)
where theanswer
is supposed to be in thecontext
. This dataset doesn't seem to have a context for the annotations. Sample from one of the documents:
<SUBJECT></SUBJECT>
<MESSAGE>Literature on Cardiac amyloidosis. Please let me know where I can get literature on Cardiac amyloidosis. My uncle died yesterday from this disorder. Since this is such a rare disorder, and to honor his memory, I would like to distribute literature at his funeral service. I am a retired NIH employee, so I am familiar with the campus in case you have literature at NIH that I can come and pick up. Thank you </MESSAGE>
<SUB-QUESTIONS>
<SUB-QUESTION subqid="Q1-S1">
<ANNOTATIONS>
<FOCUS>cardiac amyloidosis</FOCUS>
<TYPE>information</TYPE>
</ANNOTATIONS>
<ANSWERS>
<ANSWER answerid="Q1-S1-A1" pairid="1">Cardiac amyloidosis is a disorder caused by deposits of an abnormal protein (amyloid) in the heart tissue. These deposits make it hard for the heart to work properly.</ANSWER>
<ANSWER answerid="Q1-S1-A2" pairid="2">The term "amyloidosis" refers not to a single disease but to a collection of diseases in which a protein-based infiltrate deposits in tissues as beta-pleated sheets. The subtype of the disease is determined by which protein is depositing; although dozens of subtypes have been described, most are incredibly rare or of trivial importance. This analysis will focus on the main systemic forms of amyloidosis, both of which frequently involve the heart.</ANSWER>
</ANSWERS>
</SUB-QUESTION>
</SUB-QUESTIONS>