BioGPT icon indicating copy to clipboard operation
BioGPT copied to clipboard

PubMedQA Prompt format.

Open badger-lord opened this issue 2 years ago • 13 comments

It's unclear to me in the paper how the QA is being fed into the model. Not so much context and question, but the answer.

In the data you have, the long answer is there in all three sets. Does bioGPT use the long answer as context to answer "yes|no|maybe" and the results are the 78.2% reported? Or, does bioGPT score 78.2% by generated the long answer AND answering "yes|no| maybe"?

I'm trying to recreate the workflow and having trouble identifying bioGPT's route to this, please advise.

badger-lord avatar Mar 27 '23 22:03 badger-lord

The PubMedQA Leaderboard(has been changed to "settings that require reasoning") seems to be unable to find BioGPT.

https://pubmedqa.github.io/

ZON-ZONG-MIN avatar Apr 02 '23 20:04 ZON-ZONG-MIN

@badger-lord what is the expected format? should not include long answer as context for reasoning-required setting right?

ArvinZhuang avatar Apr 03 '23 10:04 ArvinZhuang

@badger-lord what is the expected format? should not include long answer as context for reasoning-required setting right? Pulled from the PubMedQA paper. "A parallel setting, where models can use question and long answer to predict yes/no/maybe answer, is denoted as reasoning-free setting since yes/no/maybe are usually explicitly expressed in the long answers"

From the way the training/testing data is set up, I would say BioGPT is reasoning-free, not reasoning-required.

badger-lord avatar Apr 03 '23 16:04 badger-lord

@badger-lord what is the expected format? should not include long answer as context for reasoning-required setting right? Pulled from the PubMedQA paper. "A parallel setting, where models can use question and long answer to predict yes/no/maybe answer, is denoted as reasoning-free setting since yes/no/maybe are usually explicitly expressed in the long answers"

From the way the training/testing data is set up, I would say BioGPT is reasoning-free, not reasoning-required.

Yeah, but even not that, reasoning-free should not include context, right?

ArvinZhuang avatar Apr 03 '23 23:04 ArvinZhuang

@badger-lord what is the expected format? should not include long answer as context for reasoning-required setting right? Pulled from the PubMedQA paper. "A parallel setting, where models can use question and long answer to predict yes/no/maybe answer, is denoted as reasoning-free setting since yes/no/maybe are usually explicitly expressed in the long answers"

From the way the training/testing data is set up, I would say BioGPT is reasoning-free, not reasoning-required.

Yeah, but even not that, reasoning-free should not include context, right?

No, reasoning free should include everything, reasoning required should not include long answer. Reasoning free is easier than reasoning required. So, for BioGPT, do you know which they are reporting? I would assume Reasoning free, since their train/test data contains the long answer.

badger-lord avatar Apr 04 '23 13:04 badger-lord

@badger-lord Hi~ Did you get a chance to recreate the workflow?

I follow BioGPT/examples/QA-PubMedQA/README.md to Test PubMedQA generate_checkpoint.pt will be generated after running infer.sh I see generate_checkpoint.pt does not contain long answer. I'm not sure what this means?

ZON-ZONG-MIN avatar Apr 10 '23 19:04 ZON-ZONG-MIN

@badger-lord Hi~ Did you get a chance to recreate the workflow?

I follow BioGPT/examples/QA-PubMedQA/README.md to Test PubMedQA generate_checkpoint.pt will be generated after running infer.sh I see generate_checkpoint.pt does not contain long answer. I'm not sure what this means?

I have not got it to run actually, recreating the workflow due to some permissions and set up issues. I was really curious what it does with the data preprocessing into training. I'm going off of the data they provided in the github, which still am not convinced they do reasoning-required since they don't explicitly state it in the github or the paper. Can you confirm the training/testing data when recreating the workflow does not contain the long answer in the context?

badger-lord avatar Apr 11 '23 14:04 badger-lord

@badger-lord I followed this and seems there is no answer span in their input data. So should be reasoning-required. I got 0.782 with their provided checkpoint.

ArvinZhuang avatar Apr 11 '23 23:04 ArvinZhuang

Oh my God

rezabuughunter avatar May 06 '23 06:05 rezabuughunter

@badger-lord I followed this and seems there is no answer span in their input data. So should be reasoning-required. I got 0.782 with their provided checkpoint.

This was for the large version?

SantoshGuptaML avatar May 15 '23 15:05 SantoshGuptaML

I think it was Medium size (the one in the paper)

ArvinZhuang avatar May 16 '23 06:05 ArvinZhuang

I think it was Medium size (the one in the paper)

The paper had two, in march I think was the update to include the large 1.5 billion parameter model at the end. The initial version I believe only had a 770m parameter model.

Also, I checked the paper, and it explicitly mentions reasoning required

In the Question Answering section

We apply techniques such as two-stage fine-tuning [16] and noisy labels to improve the performance. We measure and compare the classification accuracy of the reasoning required setting described in [16].

https://arxiv.org/pdf/2210.10341.pdf

SantoshGuptaML avatar May 18 '23 15:05 SantoshGuptaML

I was able to get a yes/no answer from the BioGPT-Large-PubMedQA model with the following prompt:

question: <question_text> context <context_text> the answer to the question given the context is

as described in the paper.

Steven-Yiran avatar Oct 15 '24 18:10 Steven-Yiran