saber
saber copied to clipboard
Saber is a deep-learning based tool for information extraction in the biomedical domain. Pull requests are welcome! Note: this is a work in progress. Many things are broken, and the codebase is not st...
Currently, the `config.ini` file, which contains settings for using Saber, is highly coupled to the Keras BiLSTM-CRF model. This needs to be fixed. One solution would be to maintain a...
Currently, the splitting of training data (into a validation or cross-validation split(s)) happens in `prepare_data_for_training()` which is defined by each model It should be moved from the models themselves to...
Currently, to align BERT tokens to original tokens (before BERT tokenization) we use some code I grabbed from the official BERT repo. SpaCy has introduced [functions specifically for aligning two...
Currently, things are saved in different directories under wherever the `saber` directory is on a users computer. This should be changed such that everything is saved in some directory directly...
The `Metrics` class is a bit of a mess. It should behave more like a Keras `History` callback, and should be the returned object from a call to `saber.train()`. This...
Check out [this grounding service](https://github.com/bgyori/gilda) built by our collaborators and consider switching if it outperforms the current system.
`Saber.load_dataset()` should be able to pull from [pubannotation.org](http://pubannotation.org/) given a projects URL. E.g. ```python saber.load_dataset('http://pubannotation.org/projects/AGAC_training/annotations.tgz') ``` should download the dataset to `~/saber/datasets`, convert it to the CoNLL 2003 format, and...
Need to add a way to print out a nicely formatted representation of a PyTorch model. Wrap this under a `summary()` function in the `BasePyTorchModel` so that its usage is...
Current, the `debug` argument loads only 10K embeddings, which is helpful for debugging. It would be useful if it also only loaded a proportion of sentences as well!
The current suite of unit tests leave a lot to be desired. Steps to improve this: - [ ] Set aside enough time to learn the in's and out's of...