covid-papers-browser Train transformer models on MedNLI

Train transformer models on MedNLI

Open ViktorAlm opened this issue 4 years ago • 3 comments

Great work!

Heres some data that might be more domain related. Its not open but it might be helpful? https://jgc128.github.io/mednli/

Mar 30 '20 10:03 ViktorAlm

Looks neat! Didn't know about datasets for medical NLI, that's perfect for our use case indeed!

If someone is interested in finetuning the three pretrained models SciBERT, BioBERT and CovidBERT on this MedNLI dataset using the finetune_nli.py script and upload them on the HuggingFace cloud, I'll add them in the list!

I'm changing the Issue title to make this visible for other contributors!

Mar 31 '20 07:03 gsarti

I'm interested in finetuning BioBERT using MedNLI dataset. I need the following information

a) Why did you choose batch size of 64 instead of 16 to train all the NLI models (biobert-nli, scibert-nli and covidbert-nli) ? b) How many epochs did you train these models? (default no. of epochs is 1 in the sentence transformer library)

Thanks in advance............... @gsarti

Jun 10 '20 05:06 ghost

Hi @kalyanks0611,

The choice of a larger batch size was only due to the intuition that this would limit noise during training, I have no empirical proof that this leads to better downstream performances in practice.

The NLI models were trained with different number of steps (20,000, 23,000 and 30,000 respectively): this is also due to GPU time allowances, and not set empirically. 30,000 steps at batch size 64 correspond to 1,920,000 examples, which is a bit less than two full epochs on MultiNLI + SNLI, that together account for roughly 1M sentence pairs.

Hope this helps!

Jun 10 '20 09:06 gsarti

covid-papers-browser covid-papers-browser copied to clipboard

Train transformer models on MedNLI

covid-papers-browser
covid-papers-browser copied to clipboard