segtok icon indicating copy to clipboard operation
segtok copied to clipboard

bug: split_contractions fails for certain patterns

Open MattGPT-ai opened this issue 1 year ago • 0 comments

For example:

split_contractions(word_tokenizer("OʼHaraʼs"))
<<< ['OʼHara', 'O', 'ʼHaraʼs']

MattGPT-ai avatar Aug 30 '24 11:08 MattGPT-ai