ml-qrecc download this collection

hi， when I download this collection of qrecc, it always returns an error of 503, so I want to know the size of the collection-paragraph that is splited collections into little. If it is not big enough, can you share it with us?

Jun 16 '23 10:06 bbei-z

were you able to resolve the issue? When you follow do you get 54M passages as mentioned? @RavitejaAnantha @tuzhucheng

Oct 02 '24 11:10 wickcode

Sorry about the late reply. You can find a pre-built collection of passages here on AWS S3: aws s3 ls s3://mt-qrecc/collection-paragraph/.

Nov 18 '24 06:11 tuzhucheng

@tuzhucheng Access Denied when ls your S3, could you confirm?

BTW, the raw web pages can be downloaded from Zenodo (passages.zip).

Dec 19 '24 23:12 hankcs

Hmm, I just tried to make it public again, please retry.

Jan 17 '25 19:01 tuzhucheng

Tried again, still Access Denied:

aws s3 ls s3://mt-qrecc/collection-paragraph/
curl https://mt-qrecc.s3.amazonaws.com/collection-paragraph/

Jan 18 '25 01:01 hankcs

Hmm, what if you try the https url: https://mt-qrecc.s3.us-west-2.amazonaws.com/collection-paragraph/collection-paragraph.tar.gz.partaa? The file names are collection-paragraph.tar.gz.partaa to collection-paragraph.tar.gz.partaz (26 files).

Feb 05 '25 08:02 tuzhucheng

Hmm, what if you try the https url: https://mt-qrecc.s3.us-west-2.amazonaws.com/collection-paragraph/collection-paragraph.tar.gz.partaa? The file names are to (26 files).collection-paragraph.tar.gz.partaa``collection-paragraph.tar.gz.partaz

Hello, for the rounds where the "truth passage" field is not annotated, is it due to missing annotations, or is there another reason? For example, in the first round of dialogue 1 in the test set:

{
    "Answer_URL": "https://explorehealthcareers.org/career/medicine/physician-assistant/",
    "Context": [],
    "Conversation_no": 1,
    "Conversation_source": "trec",
    "Question": "What is a physician's assistant?",
    "Transformer_rewrite": "What is a physician's assistant",
    "Truth_answer": "physician assistants are medical providers who are licensed to diagnose and treat illness and disease and to prescribe medication for patients",
    "Truth_passages": [],
    "Truth_rewrite": "What is a physician's assistant?",
    "Turn_no": 1
}

Mar 26 '25 06:03 lujiarui-iie

Hmm, what if you try the https url: https://mt-qrecc.s3.us-west-2.amazonaws.com/collection-paragraph/collection-paragraph.tar.gz.partaa? The file names are to (26 files). collection-paragraph.tar.gz.partaacollection-paragraph.tar.gz.partaz ``

Hello, for the rounds where the "truth passage" field is not annotated, is it due to missing annotations, or is there another reason? For example, in the first round of dialogue 1 in the test set:

{ "Answer_URL": "https://explorehealthcareers.org/career/medicine/physician-assistant/", "Context": [], "Conversation_no": 1, "Conversation_source": "trec", "Question": "What is a physician's assistant?", "Transformer_rewrite": "What is a physician's assistant", "Truth_answer": "physician assistants are medical providers who are licensed to diagnose and treat illness and disease and to prescribe medication for patients", "Truth_passages": [], "Truth_rewrite": "What is a physician's assistant?", "Turn_no": 1 }

There is another question I'd like to ask. Regarding the first turn of conversation 1 in the mentioned test set, its truth answer corresponds to the sentence in the paragraph at http://web.archive.org/web/20200106012242id_/https://explorehealthcareers.org/career/medicine/physician-assistant/_p0. However, the test set does not have a truth passage labeled for it.

Mar 26 '25 07:03 lujiarui-iie