jieba-php icon indicating copy to clipboard operation
jieba-php copied to clipboard

Issue with handling numbers, decimals, percentges in text

Open p4u1d34n opened this issue 6 years ago • 1 comments

see image

https://imgur.com/jamohRG

when using the tool to segment content which contains decimal numbers, the number is split at the decimal, also, if a percentage is used, the symbol is stripped out. Is there a way to preserve these items?

p4u1d34n avatar Jan 30 '19 12:01 p4u1d34n

i managed to work round this by adapting the cut method in Posseg.php

$re_punctuation_pattern = '([\x{ff5e}\x{ff01}\x{ff08}\x{ff09}\x{300e}'. '\x{300c}\x{300d}\x{300f}\x{3001}\x{ff1a}\x{ff1b}'. '\x{ff0c}\x{ff1f}\x{3002}\x{0025}\x{002E}]+)';

https://imgur.com/a/dLuNNHm

p4u1d34n avatar Jan 31 '19 10:01 p4u1d34n