John Giorgi
John Giorgi
Need to train models for each major entity class: `PRGE`, `LIVB`, `DISO`, `CHED`. The first three are fairly straight-forward. As for the last, there are multiple levels of granularity to...
The PyTorch Transformer library recently added a new `AutoModel` API, which lets you instantiate one of the many pre-trained transformers that are available (BERT, GPT-2, RoBERTa, etc.). We should switch...
When batching data, Saber truncates / right-pads each sequence to match a length of `saber.constants.MAX_SENT_LEN`. Truncating sequences should only happen on the train set, ensuring that we don't drop examples...
Currently, we are using `keras.preprocessing.text` to pad sequences. This function is easy to use and convenient, but given that we have dropped Keras support (#157) we will need to find...
There is currently no easy way to evaluate a trained model. There should be some kind of interface for this, e.g. ```python from saber import Saber sb = Saber() sb.load('path/to/some/model')...
Use a decorator to time functions in saber class. https://realpython.com/primer-on-python-decorators/
In the docs, models for each major entity type are listed, but not all of them are implemented. The user should get an error when they try to load these...
Currently, the `config.ini` file, which contains settings for using Saber, is highly coupled to the Keras BiLSTM-CRF model. This needs to be fixed. One solution would be to maintain a...
Currently, the splitting of training data (into a validation or cross-validation split(s)) happens in `prepare_data_for_training()` which is defined by each model It should be moved from the models themselves to...
Currently, to align BERT tokens to original tokens (before BERT tokenization) we use some code I grabbed from the official BERT repo. SpaCy has introduced [functions specifically for aligning two...