jieba-php
jieba-php copied to clipboard
Issue with handling numbers, decimals, percentges in text
see image
https://imgur.com/jamohRG
when using the tool to segment content which contains decimal numbers, the number is split at the decimal, also, if a percentage is used, the symbol is stripped out. Is there a way to preserve these items?
i managed to work round this by adapting the cut method in Posseg.php
$re_punctuation_pattern = '([\x{ff5e}\x{ff01}\x{ff08}\x{ff09}\x{300e}'. '\x{300c}\x{300d}\x{300f}\x{3001}\x{ff1a}\x{ff1b}'. '\x{ff0c}\x{ff1f}\x{3002}\x{0025}\x{002E}]+)';
https://imgur.com/a/dLuNNHm