tokenizer icon indicating copy to clipboard operation
tokenizer copied to clipboard

Issue 6

Open joshweir opened this issue 7 years ago • 0 comments

Fix #6

Created new splittable PRE_N_POST_ONLY which holds characters which can be both prefixes and suffixes but are only a splittable if at the beginning or end of a token with the exception of being prefixed/suffixed by other splittables. Taking the single quote ' as a PRE_N_POST_ONLY splittable, the following would be valid use cases as a splittable:

  • 'test quotes'
  • 'test quotes'. <- suffixed by another splittable
  • ('test quotes'). <- prefixed and suffixed by another splittable

The following would not be valid uses as a splittable:

  • l'interrelation
  • l'imagerie

joshweir avatar Mar 31 '17 08:03 joshweir