jieba-rs icon indicating copy to clipboard operation
jieba-rs copied to clipboard

Fix cut_all mixed chinese & english issue

Open messense opened this issue 5 years ago • 2 comments
trafficstars

The same as the fix of the Python version: https://github.com/fxsjy/jieba/commit/97c32464e122055b10d511bfd1eab0b38b08622a

messense avatar Jul 19 '20 13:07 messense

cc @MnO2

messense avatar Jul 19 '20 14:07 messense

@messense : Code mixing is a hard problem, it's about where would you draw the the boundary of Chinese vocabulary. Not only English alphabet could be used in the product names, but Japanese hiragana as well like . I would argue this is beyond the scope a Chinese segmenter, but for sure we can apply the work-around like the one in python implementation for practical reasons.

MnO2 avatar Jul 19 '20 15:07 MnO2