Hlib issues

Results 5 issues of


                                            Hlib

Update README.md

update location of DeepLearn2021: moved ONLINE, see the latest news here https://irdta.eu/deeplearn2021s/

PreprocessingMetadata enhancement

* Rename `PreprocessingMetadata` -> `PreppedTokenMetadata` * Represent `word_boundaries` field as a list of the number of subtoken in each token, e.g [1, 3, 1, 2] instead of [0, 1, 4,...

enhancement

By default use end-of-full-token character (</t>) instead of token boundaries (<w>, </w>) for all kinds of pre-processing for consistency

Currently: ```python >>> api.basic("getName") ['', 'get', 'Name', ''] ``` To be done: ```python >>> api.basic("getName") ['get', 'Name', ''] ```

enhancement

Create PreppedTokenSequence class to incapsulate getting full tokens from subtokens

The tasks for the new `PreppedTokenSequence` class are to encapsulate getting full tokens from subtokens (which is currently done by `FullTokenIterator` class) and at the same time provide transparent access...

enhancement

Enhance `ParsedToken` hierarchy

* rename `SplitContainer` to Identifier * make Identifier abstract and extend it with `SingleWordIdentifier`, `TwoWordIdentifier`, `ThreeWordIdentifier`, `FourOrMoreWordIdentifier` * make other classes that have sub-classes abstract

enhancement