download this collection
hi, when I download this collection of qrecc, it always returns an error of 503, so I want to know the size of the collection-paragraph that is splited collections into little. If it is not big enough, can you share it with us?
were you able to resolve the issue? When you follow do you get 54M passages as mentioned? @RavitejaAnantha @tuzhucheng
Sorry about the late reply. You can find a pre-built collection of passages here on AWS S3: aws s3 ls s3://mt-qrecc/collection-paragraph/.
@tuzhucheng Access Denied when ls your S3, could you confirm?
BTW, the raw web pages can be downloaded from Zenodo (passages.zip).
Hmm, I just tried to make it public again, please retry.
Tried again, still Access Denied:
- aws s3 ls s3://mt-qrecc/collection-paragraph/
- curl https://mt-qrecc.s3.amazonaws.com/collection-paragraph/
Hmm, what if you try the https url: https://mt-qrecc.s3.us-west-2.amazonaws.com/collection-paragraph/collection-paragraph.tar.gz.partaa? The file names are collection-paragraph.tar.gz.partaa to collection-paragraph.tar.gz.partaz (26 files).
Hmm, what if you try the https url: https://mt-qrecc.s3.us-west-2.amazonaws.com/collection-paragraph/collection-paragraph.tar.gz.partaa? The file names are to (26 files).
collection-paragraph.tar.gz.partaa``collection-paragraph.tar.gz.partaz
Hello, for the rounds where the "truth passage" field is not annotated, is it due to missing annotations, or is there another reason? For example, in the first round of dialogue 1 in the test set:
{
"Answer_URL": "https://explorehealthcareers.org/career/medicine/physician-assistant/",
"Context": [],
"Conversation_no": 1,
"Conversation_source": "trec",
"Question": "What is a physician's assistant?",
"Transformer_rewrite": "What is a physician's assistant",
"Truth_answer": "physician assistants are medical providers who are licensed to diagnose and treat illness and disease and to prescribe medication for patients",
"Truth_passages": [],
"Truth_rewrite": "What is a physician's assistant?",
"Turn_no": 1
}
Hmm, what if you try the https url: https://mt-qrecc.s3.us-west-2.amazonaws.com/collection-paragraph/collection-paragraph.tar.gz.partaa? The file names are to (26 files).
collection-paragraph.tar.gz.partaacollection-paragraph.tar.gz.partaz ``Hello, for the rounds where the "truth passage" field is not annotated, is it due to missing annotations, or is there another reason? For example, in the first round of dialogue 1 in the test set:
{ "Answer_URL": "https://explorehealthcareers.org/career/medicine/physician-assistant/", "Context": [], "Conversation_no": 1, "Conversation_source": "trec", "Question": "What is a physician's assistant?", "Transformer_rewrite": "What is a physician's assistant", "Truth_answer": "physician assistants are medical providers who are licensed to diagnose and treat illness and disease and to prescribe medication for patients", "Truth_passages": [], "Truth_rewrite": "What is a physician's assistant?", "Turn_no": 1 }
There is another question I'd like to ask. Regarding the first turn of conversation 1 in the mentioned test set, its truth answer corresponds to the sentence in the paragraph at http://web.archive.org/web/20200106012242id_/https://explorehealthcareers.org/career/medicine/physician-assistant/_p0. However, the test set does not have a truth passage labeled for it.