tokenizer
tokenizer copied to clipboard
Issue 6
Fix #6
Created new splittable PRE_N_POST_ONLY
which holds characters which can be both prefixes and suffixes but are only a splittable if at the beginning or end of a token with the exception of being prefixed/suffixed by other splittables.
Taking the single quote '
as a PRE_N_POST_ONLY
splittable, the following would be valid use cases as a splittable:
-
'test quotes'
-
'test quotes'.
<- suffixed by another splittable -
('test quotes').
<- prefixed and suffixed by another splittable
The following would not be valid uses as a splittable:
-
l'interrelation
-
l'imagerie