saber
saber copied to clipboard
Saber is a deep-learning based tool for information extraction in the biomedical domain. Pull requests are welcome! Note: this is a work in progress. Many things are broken, and the codebase is not st...
We should see if using the cyclic learning rate finder (paper: [here](https://arxiv.org/abs/1506.01186)) along with an adaptive learning rate optimizer (e.g., adam) improves on our current optimizer (nadam). __Todo__ - [...
Currently, we are using SpaCy to do low level NLP tasks (like tokenization, sentence segmentation, POS tagging and parsing). However, these models were trained on general domain text. The folks...
Evaluation is currently very slow. Try to profile it and see if we can speed it up. Additionally, add a new config argument `--evaluation_step`, that would only perform evaluation every...
Implement better handling of rare words. Currently, we simply drop words that appear less than some threshold number of times (`saber.constants.NUM_RARE`). There exist better schemes, such as __word dropout__, which...
We need to properly package Saber, and publish it on PyPI so that it can be `pip install saber`. I don't know how to do this, so I need to...
When transfer learning, the argument `fine_tune_word_embeddings` has no effect. Need to retroactively set the embedding layer to `trainable` if `fine_tune_word_embeddings` in such a way that it will work for simple...
The `saber.cli.lr_find` tool we are working on will output graphs to help the user (and us) determine the best learning rate range. It would be cool if we could output...
The annotated entities in a given corpus roughly follow a [Zipfian distribution](https://en.wikipedia.org/wiki/Zipf%27s_law). This means that some entities are repeated many many times (e.g. `Human`, `Mouse`, `p53`, `glucose`), but _most_ entities...
Due to the current implementation of the `Config` class, any argument whos default value is `True` in the config file and which has the `action=store_true` property is useless from the...
There is currently little to no `tensorboard` support. It would be helpful if this was properly setup -- hardly a priority at this point and time though.