PubMedQA Prompt format.
It's unclear to me in the paper how the QA is being fed into the model. Not so much context and question, but the answer.
In the data you have, the long answer is there in all three sets. Does bioGPT use the long answer as context to answer "yes|no|maybe" and the results are the 78.2% reported? Or, does bioGPT score 78.2% by generated the long answer AND answering "yes|no| maybe"?
I'm trying to recreate the workflow and having trouble identifying bioGPT's route to this, please advise.
The PubMedQA Leaderboard(has been changed to "settings that require reasoning") seems to be unable to find BioGPT.
https://pubmedqa.github.io/
@badger-lord what is the expected format? should not include long answer as context for reasoning-required setting right?
@badger-lord what is the expected format? should not include long answer as context for reasoning-required setting right? Pulled from the PubMedQA paper. "A parallel setting, where models can use question and long answer to predict yes/no/maybe answer, is denoted as reasoning-free setting since yes/no/maybe are usually explicitly expressed in the long answers"
From the way the training/testing data is set up, I would say BioGPT is reasoning-free, not reasoning-required.
@badger-lord what is the expected format? should not include long answer as context for reasoning-required setting right? Pulled from the PubMedQA paper. "A parallel setting, where models can use question and long answer to predict yes/no/maybe answer, is denoted as reasoning-free setting since yes/no/maybe are usually explicitly expressed in the long answers"
From the way the training/testing data is set up, I would say BioGPT is reasoning-free, not reasoning-required.
Yeah, but even not that, reasoning-free should not include context, right?
@badger-lord what is the expected format? should not include long answer as context for reasoning-required setting right? Pulled from the PubMedQA paper. "A parallel setting, where models can use question and long answer to predict yes/no/maybe answer, is denoted as reasoning-free setting since yes/no/maybe are usually explicitly expressed in the long answers"
From the way the training/testing data is set up, I would say BioGPT is reasoning-free, not reasoning-required.
Yeah, but even not that, reasoning-free should not include context, right?
No, reasoning free should include everything, reasoning required should not include long answer. Reasoning free is easier than reasoning required. So, for BioGPT, do you know which they are reporting? I would assume Reasoning free, since their train/test data contains the long answer.
@badger-lord Hi~ Did you get a chance to recreate the workflow?
I follow BioGPT/examples/QA-PubMedQA/README.md to Test PubMedQA
generate_checkpoint.pt will be generated after running infer.sh
I see generate_checkpoint.pt does not contain long answer.
I'm not sure what this means?
@badger-lord Hi~ Did you get a chance to recreate the workflow?
I follow BioGPT/examples/QA-PubMedQA/README.md to Test PubMedQA
generate_checkpoint.ptwill be generated after runninginfer.shI seegenerate_checkpoint.ptdoes not contain long answer. I'm not sure what this means?
I have not got it to run actually, recreating the workflow due to some permissions and set up issues. I was really curious what it does with the data preprocessing into training. I'm going off of the data they provided in the github, which still am not convinced they do reasoning-required since they don't explicitly state it in the github or the paper. Can you confirm the training/testing data when recreating the workflow does not contain the long answer in the context?
@badger-lord I followed this and seems there is no answer span in their input data. So should be reasoning-required. I got 0.782 with their provided checkpoint.
Oh my God
@badger-lord I followed this and seems there is no answer span in their input data. So should be reasoning-required. I got 0.782 with their provided checkpoint.
This was for the large version?
I think it was Medium size (the one in the paper)
I think it was Medium size (the one in the paper)
The paper had two, in march I think was the update to include the large 1.5 billion parameter model at the end. The initial version I believe only had a 770m parameter model.
Also, I checked the paper, and it explicitly mentions reasoning required
In the Question Answering section
We apply techniques such as two-stage fine-tuning [16] and noisy labels to improve the performance. We measure and compare the classification accuracy of the reasoning required setting described in [16].
https://arxiv.org/pdf/2210.10341.pdf
I was able to get a yes/no answer from the BioGPT-Large-PubMedQA model with the following prompt:
question: <question_text> context <context_text> the answer to the question given the context is
as described in the paper.