John Giorgi

Results 64 issues of John Giorgi

Need to train models for each major entity class: `PRGE`, `LIVB`, `DISO`, `CHED`. The first three are fairly straight-forward. As for the last, there are multiple levels of granularity to...

enhancement
production

The PyTorch Transformer library recently added a new `AutoModel` API, which lets you instantiate one of the many pre-trained transformers that are available (BERT, GPT-2, RoBERTa, etc.). We should switch...

enhancement
feature
design

When batching data, Saber truncates / right-pads each sequence to match a length of `saber.constants.MAX_SENT_LEN`. Truncating sequences should only happen on the train set, ensuring that we don't drop examples...

invalid

Currently, we are using `keras.preprocessing.text` to pad sequences. This function is easy to use and convenient, but given that we have dropped Keras support (#157) we will need to find...

chore

There is currently no easy way to evaluate a trained model. There should be some kind of interface for this, e.g. ```python from saber import Saber sb = Saber() sb.load('path/to/some/model')...

enhancement
invalid

Use a decorator to time functions in saber class. https://realpython.com/primer-on-python-decorators/

In the docs, models for each major entity type are listed, but not all of them are implemented. The user should get an error when they try to load these...

Currently, the `config.ini` file, which contains settings for using Saber, is highly coupled to the Keras BiLSTM-CRF model. This needs to be fixed. One solution would be to maintain a...

enhancement
invalid

Currently, the splitting of training data (into a validation or cross-validation split(s)) happens in `prepare_data_for_training()` which is defined by each model It should be moved from the models themselves to...

enhancement
invalid

Currently, to align BERT tokens to original tokens (before BERT tokenization) we use some code I grabbed from the official BERT repo. SpaCy has introduced [functions specifically for aligning two...