fugashi icon indicating copy to clipboard operation
fugashi copied to clipboard

Repeated `parseToNode` calls invalidate `char_type`

Open polm opened this issue 1 year ago • 0 comments

See https://github.com/polm/cutlet/issues/59 for details. Minimal repro:

from fugashi import Tagger
tagger = Tagger()
xx = tagger("日本語")
print(xx[0].char_type) # => 2
tagger("にほんご") # note this is not assigned anywhere
print(xx[0].char_type) # => 6, this is wrong

This probably affects other members of cnode, so they'll have to be eagerly copied. This may have performance impact, though we'll have to put up with it.

There could be an "unsafe mode" that returns nodes that could be invalidated, but are faster because they don't trigger allocations. However I'm not sure there's much demand for more speed.

polm avatar Nov 19 '24 14:11 polm