cppjieba icon indicating copy to clipboard operation
cppjieba copied to clipboard

Is there any API count offset by characters(including Chinese) instead of bytes ?

Open royguo opened this issue 8 years ago • 1 comments

Hi, I'm trying to use Jieba.Cut(text, result) here, but the result shows that, it counts offsets by bytes, not unicode characters. My text content have Chinese and English characters mixed, so I wonder is there any way to make it ? thanks for your great work!

royguo avatar Oct 13 '16 03:10 royguo

Would it be OK to just use vector<cppjieba::Word>::iterator to go through the words, count and assign the "sequence number" you want? If you are using this char offset to evaluate proximity, then I think this number does not have to be very precise.

w32zhong avatar Oct 13 '16 06:10 w32zhong