Transformer_Temporal_Tagger
Transformer_Temporal_Tagger copied to clipboard
Automatically batch texts when too long
For samples that exceed the 512 subword token limit, we currently do not have a strategy in place to deal with this. This is both unwanted and relatively easy to improve. There are a few considerations with respect to the exact strategy to be used, but it seems like a good starting point to approximate sentences with something like a lightweight spacy model, and then chunk based on approximate max length.