spacyr icon indicating copy to clipboard operation
spacyr copied to clipboard

Hyphenated words

Open seb-29 opened this issue 9 months ago • 1 comments

The spaCy tokenizer splits hyphenated words by inserting a space before and after the hyphen. For example, "eye-opening" becomes "eye - opening". Is there a way to keep hyphenated words together, like with the quanteda tokenizers? (@JBGruber : Any idea? :))

seb-29 avatar May 15 '24 20:05 seb-29