cppjieba
cppjieba copied to clipboard
Is there any API count offset by characters(including Chinese) instead of bytes ?
Hi,
I'm trying to use Jieba.Cut(text, result)
here, but the result shows that, it counts offset
s by bytes, not unicode characters.
My text content have Chinese and English characters mixed, so I wonder is there any way to make it ? thanks for your great work!
Would it be OK to just use vector<cppjieba::Word>::iterator
to go through the words, count and assign the "sequence number" you want? If you are using this char offset to evaluate proximity, then I think this number does not have to be very precise.