jieba-rs
jieba-rs copied to clipboard
Fix cut_all mixed chinese & english issue
trafficstars
The same as the fix of the Python version: https://github.com/fxsjy/jieba/commit/97c32464e122055b10d511bfd1eab0b38b08622a
cc @MnO2
@messense : Code mixing is a hard problem, it's about where would you draw the the boundary of Chinese vocabulary. Not only English alphabet could be used in the product names, but Japanese hiragana as well like の. I would argue this is beyond the scope a Chinese segmenter, but for sure we can apply the work-around like the one in python implementation for practical reasons.