biomedical Create dataset loader for TREC-2017 LiveQA

Create dataset loader for TREC-2017 LiveQA

Open jason-fries opened this issue 2 years ago • 9 comments

Adding a Dataset

Name: TREC-2017 LiveQA
Description: None provided
Task: QA
Paper: https://trec.nist.gov/pubs/trec26/papers/Overview-QA.pdf
Data: https://github.com/abachaa/LiveQA_MedicalTask_TREC2017
License: ?

Mar 22 '22 00:03 jason-fries

#self-assign

Apr 01 '22 10:04 luou-wen

Hi @luou-wen can you let us know if you are still working on this so we can update our project board? Please just notify us the status by Friday April 8. You can response to this comment or ping us on Slack or Discord.

No worries if you are not finished but still intend to work on this!

Apr 07 '22 22:04 jason-fries

Hi @jason-fries sorry for the late response. I did not see this message until today. If possible, may I pick this back up? I was intending to finish the dataloader and make a pull request today.

Apr 10 '22 08:04 luou-wen

Hi @luou-wen yes of course! I just re-assigned you.

Apr 10 '22 16:04 hakunanatasha

@hakunanatasha Thank you very much! I will continue working on it and make a pull request asap.

Apr 10 '22 18:04 luou-wen

Hi @luou-wen, Just a ping on the status of this dataset. Please let us know if you are still working on it and when you plan to submit a PR. Thanks!!

Apr 19 '22 22:04 jason-fries

Hi @jason-fries, Apologies for the delay. I am still working on it, and I will submit a PR by this Sunday at the latest.

Apr 19 '22 22:04 luou-wen

#self-assign

Jun 07 '22 20:06 shamikbose

@hakunanatasha @jason-fries I have a couple of questions about this dataset:

This dataset has multiple answers for the same question. The bigbio_qa schema has one answer per question. Should I create multiple uids for same questions with different answers?
For QA tasks, it seems like they are framed as (question, context, answer) where the answer is supposed to be in the context. This dataset doesn't seem to have a context for the annotations. Sample from one of the documents:

<SUBJECT></SUBJECT>
	<MESSAGE>Literature on Cardiac amyloidosis.  Please let me know where I can get literature on Cardiac amyloidosis.  My uncle died yesterday from this disorder.  Since this is such a rare disorder, and to honor his memory, I would like to distribute literature at his funeral service.  I am a retired NIH employee, so I am familiar with the campus in case you have literature at NIH that I can come and pick up.  Thank you </MESSAGE>
	<SUB-QUESTIONS>
		<SUB-QUESTION subqid="Q1-S1">
			<ANNOTATIONS>
				<FOCUS>cardiac amyloidosis</FOCUS>
				<TYPE>information</TYPE>
			</ANNOTATIONS>
			<ANSWERS>
				<ANSWER answerid="Q1-S1-A1" pairid="1">Cardiac amyloidosis is a disorder caused by deposits of an abnormal protein (amyloid) in the heart tissue. These deposits make it hard for the heart to work properly.</ANSWER>
				<ANSWER answerid="Q1-S1-A2" pairid="2">The term "amyloidosis" refers not to a single disease but to a collection of diseases in which a protein-based infiltrate deposits in tissues as beta-pleated sheets. The subtype of the disease is determined by which protein is depositing; although dozens of subtypes have been described, most are incredibly rare or of trivial importance. This analysis will focus on the main systemic forms of amyloidosis, both of which frequently involve the heart.</ANSWER>
			</ANSWERS>
		</SUB-QUESTION>
	</SUB-QUESTIONS>

Jun 08 '22 15:06 shamikbose

biomedical biomedical copied to clipboard

Create dataset loader for TREC-2017 LiveQA

Adding a Dataset

biomedical
biomedical copied to clipboard