fugashi
fugashi copied to clipboard
Repeated `parseToNode` calls invalidate `char_type`
See https://github.com/polm/cutlet/issues/59 for details. Minimal repro:
from fugashi import Tagger
tagger = Tagger()
xx = tagger("日本語")
print(xx[0].char_type) # => 2
tagger("にほんご") # note this is not assigned anywhere
print(xx[0].char_type) # => 6, this is wrong
This probably affects other members of cnode, so they'll have to be eagerly copied. This may have performance impact, though we'll have to put up with it.
There could be an "unsafe mode" that returns nodes that could be invalidated, but are faster because they don't trigger allocations. However I'm not sure there's much demand for more speed.