saber icon indicating copy to clipboard operation
saber copied to clipboard

Saber is a deep-learning based tool for information extraction in the biomedical domain. Pull requests are welcome! Note: this is a work in progress. Many things are broken, and the codebase is not st...

Results 38 saber issues
Sort by recently updated
recently updated
newest added

Currently, the `config.ini` file, which contains settings for using Saber, is highly coupled to the Keras BiLSTM-CRF model. This needs to be fixed. One solution would be to maintain a...

enhancement
invalid

Currently, the splitting of training data (into a validation or cross-validation split(s)) happens in `prepare_data_for_training()` which is defined by each model It should be moved from the models themselves to...

enhancement
invalid

Currently, to align BERT tokens to original tokens (before BERT tokenization) we use some code I grabbed from the official BERT repo. SpaCy has introduced [functions specifically for aligning two...

Currently, things are saved in different directories under wherever the `saber` directory is on a users computer. This should be changed such that everything is saved in some directory directly...

The `Metrics` class is a bit of a mess. It should behave more like a Keras `History` callback, and should be the returned object from a call to `saber.train()`. This...

enhancement
invalid
chore

Check out [this grounding service](https://github.com/bgyori/gilda) built by our collaborators and consider switching if it outperforms the current system.

enhancement
feature

`Saber.load_dataset()` should be able to pull from [pubannotation.org](http://pubannotation.org/) given a projects URL. E.g. ```python saber.load_dataset('http://pubannotation.org/projects/AGAC_training/annotations.tgz') ``` should download the dataset to `~/saber/datasets`, convert it to the CoNLL 2003 format, and...

enhancement
feature

Need to add a way to print out a nicely formatted representation of a PyTorch model. Wrap this under a `summary()` function in the `BasePyTorchModel` so that its usage is...

enhancement
design
production

Current, the `debug` argument loads only 10K embeddings, which is helpful for debugging. It would be useful if it also only loaded a proportion of sentences as well!

enhancement
invalid

The current suite of unit tests leave a lot to be desired. Steps to improve this: - [ ] Set aside enough time to learn the in's and out's of...

invalid
optimization
production