MedQA icon indicating copy to clipboard operation
MedQA copied to clipboard

Can I use document retriever component only?

Open serenayj opened this issue 3 years ago • 3 comments
trafficstars

Hi,

Congrats on finishing such nice work! I would like to test my encoder (document reader) and want to use the IR document retriever component only. Could you tell me where I could find this part of the codes and how to do it? Thank you in advance!

serenayj avatar Jun 23 '22 19:06 serenayj

I am sorry for the late reply. Thanks for reaching out to me! This code base provides the elastic search based IR baseline and you can follow the readme file to implement it. Specifically for the text (sentence or paragraph) retrieval, you can refer to this file: https://github.com/jind11/MedQA/blob/master/IR/aristomini/solvers/textsearch.py

jind11 avatar Jun 29 '22 07:06 jind11

Hi,

Thanks for answering my question!

A following question I have is: in your paper where you describe the fine-tuning pre-training BERT models, you mentioned that : Specifically, we construct the input sequence by concatenating [CLS], tokens in c, [SEP], tokens in qai, [SEP], where [CLS] and [SEP] are the classifier token and sentence separator in a pre-trained language model, respectively My understanding is that context c is a concatenation of all textbooks. Wouldn't that exceed the BERT token limit if you concatenate both questions, answers, and the context c ?

serenayj avatar Jul 12 '22 17:07 serenayj

The c here should be the top-K retrieved sentences/paragraphs in the textbooks so that we do not need to concatenate all textbooks.

jind11 avatar Jul 12 '22 23:07 jind11