cppjieba Is there any API count offset by characters(including Chinese) instead of bytes ?

Is there any API count offset by characters(including Chinese) instead of bytes ?

Open royguo opened this issue 9 years ago • 1 comments

Hi, I'm trying to use Jieba.Cut(text, result) here, but the result shows that, it counts offsets by bytes, not unicode characters. My text content have Chinese and English characters mixed, so I wonder is there any way to make it ? thanks for your great work!

Oct 13 '16 03:10 royguo

Would it be OK to just use vector<cppjieba::Word>::iterator to go through the words, count and assign the "sequence number" you want? If you are using this char offset to evaluate proximity, then I think this number does not have to be very precise.

Oct 13 '16 06:10 w32zhong

cppjieba cppjieba copied to clipboard

Is there any API count offset by characters(including Chinese) instead of bytes ?

cppjieba
cppjieba copied to clipboard