Transformer_Temporal_Tagger icon indicating copy to clipboard operation
Transformer_Temporal_Tagger copied to clipboard

Automatically batch texts when too long

Open dennlinger opened this issue 3 years ago • 0 comments

For samples that exceed the 512 subword token limit, we currently do not have a strategy in place to deal with this. This is both unwanted and relatively easy to improve. There are a few considerations with respect to the exact strategy to be used, but it seems like a good starting point to approximate sentences with something like a lightweight spacy model, and then chunk based on approximate max length.

dennlinger avatar Dec 14 '21 13:12 dennlinger