saber icon indicating copy to clipboard operation
saber copied to clipboard

Train models for each major entity class

Open JohnGiorgi opened this issue 6 years ago • 3 comments

Need to train models for each major entity class: PRGE, LIVB, DISO, CHED. The first three are fairly straight-forward. As for the last, there are multiple levels of granularity to the entity annotations, for now, might just cheat and collapse everything under the CHED tag.

For relations, we are at the mercy of what datasets are available. Right now, we could train a model for adverse drug events using the ADE corpus.

There should be a base and large version for each model. In the case of BERT, this would correspond to whether the BERT base or large model was used. Any model not implemented should raise a NotImplementedError (see #155).

Finally, the model names should follow a convention. Maybe [model-name]-[entity or relation]-[base or large], e.g. bert-for-ner-prge, bert-for-ner-prge-lg. See PyTorch Transformers or SpaCy for inspiration.

BERT

Entities

  • [ ] Train PRGE-base
  • [ ] Train PRGE-large
  • [ ] Train LIVB-base
  • [ ] Train LIVB-large
  • [ ] Train DISO-base
  • [ ] Train DISO-large
  • [ ] Train CHED-base
  • [ ] Train CHED-large

Relations

  • [ ] Train ADE

JohnGiorgi avatar Aug 02 '18 19:08 JohnGiorgi

Hi @JohnGiorgi.

I am currently working on a review of taxon mentions recognition tools for ecological information extraction, and I have just discovered Saber which I'd like too include as an example of state-of-the-art deep learning-based approach.

Unfortunately, it seems that the LIVB pre-trained model does not exist at the moment. Any idea when it might be available? Or should I consider training my own model?

Thank you for your help.

nleguillarme avatar Oct 13 '20 13:10 nleguillarme

Hi @nleguillarme,

Thanks for your interest. Unfortunately, we are no longer maintaining the project. I would suggest checking out AllenNLP, Transformers or ScispaCy for state-of-the-art NER. ScispaCy has pretrained models that will detect organism names (see the model trained on BIONLP13CG specifically).

JohnGiorgi avatar Oct 13 '20 14:10 JohnGiorgi

Too bad the project is dead, it seemed like a great tool. Thanks for the pointers.

nleguillarme avatar Oct 13 '20 14:10 nleguillarme