Getting UnicodeDecodeError accessing trie read from file
Hi, I'm consistently getting the following error when trying to access a trie from a load or read from a file.
./read_trie_test.py
Traceback (most recent call last):
File "./read_trie_test.py", line 18, in <module>
print(t.restore_key(0))
File "marisa_trie.pyx", line 324, in marisa_trie.Trie.restore_key (src/marisa_trie.cpp:6365)
File "marisa_trie.pyx", line 334, in marisa_trie.Trie.restore_key (src/marisa_trie.cpp:6299)
File "marisa_trie.pyx", line 62, in marisa_trie._get_key (src/marisa_trie.cpp:1615)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 10: invalid start byte
I get the same error if the following code is used...
for k in t.keys():
print(k)
and again the same error if I use:
t['someKey'] # or t[u'somekey']
The trie file reads in w/o any error and i've written the file using both trie.save() and trie.write() and in writing file I've used a codec.open() and codec.write() to force utf-8 encoding
I'm not sure if this is similar issue #10
ok, never mind. I was taking the examples a little to litterally
so i was loading a BytesTrie() into a constructed Trie() - once I switched to a constructed BytesTrie() it worked fine
I'm glad it is not a bug in the marisa-trie source code :) Do you have any suggestions about how to change the docs to make them more clear regarding this?
So am I :)
so as for the documentation, at the end of the load/save section, I'd just call out, that the Trie() constructor will not load a RecordTrie or a BytesTrie even though it will not fail. You need to construct the Trie class that you are trying to load.
Alternatively, the load() methods could throw an exception if a trie file of the wrong type is presented.
Part of the problem here is that the BytesTrie class should offer a static method for loading. The thought process that I think both jottos and I encountered was:
- Okay, my trie is saved, now I want to load it
- Huh, that's weird, the
loadmethod requires you to already have a trie - I guess I'll create an empty trie first, how do I do that? Oh right,
marisa_trie.Trie().
If you could call BytesTrie.load('trie.marisa') as a static method, it would be easier to not go astray.