Kyle Lo
Kyle Lo
hey @arthurbra thanks for your interest! i've been looking into this difference as well, and it looks like the task definitions are different between what we've implemented here & the...
Hey @CyndxAI unfortunately the SciBERT pretraining corpus is not publicly available. If you're interested in a large pretraining corpus for training these large language models, I can point you to...
Hey @monologg, can you try using `allenai/scibert_scivocab_uncased`? These two models actually have different vocabularies/weights, so it's not just a matter of different Tokenizer setting.
Hey Stefan, Thanks for your interest in the project. I'll look into the line counting issue, and update the reported numbers. As for the dataset splits in JNLPBA, it might...
@shizhediao ARC-ACL (jurgens et al) is citation_intent, SciCite (cohan et al) is sci_cite. Can verify from looking at the labels in the datasets. SciCite has fewer label types than ARC-ACL.
Sorry for missing this. I believe the issue you're seeing is a metric mismatch. For the Chemprot result, the standard metric is micro-F1 (which is computationally equivalent to accuracy) not...
Hi @transpurs, even if you start from our pre-trained model, you should be able to address overfitting to your own RE dataset by careful finetuning. I recommend first trying different...
@transpurs glad to hear that it's helping! Unfortunately you'll have to modify the allennlp configuration file to do this. As you can see here in the model definition: https://github.com/allenai/scibert/blob/7598219a8d80b9c2fe1323a141e4a9e40ec044cb/scibert/models/bert_text_classifier.py#L28 dropout...
Thanks for the interest. We're currently in the process of doing the fine-tuning experiments :) Look forward to the updated results when they finish
@Saichethan Our huggingface compatible weights are only for PyTorch.