marisa-trie icon indicating copy to clipboard operation
marisa-trie copied to clipboard

Getting UnicodeDecodeError accessing trie read from file

Open jottos opened this issue 10 years ago • 4 comments

Hi, I'm consistently getting the following error when trying to access a trie from a load or read from a file.

./read_trie_test.py
Traceback (most recent call last):
  File "./read_trie_test.py", line 18, in <module>
    print(t.restore_key(0))
  File "marisa_trie.pyx", line 324, in marisa_trie.Trie.restore_key (src/marisa_trie.cpp:6365)
  File "marisa_trie.pyx", line 334, in marisa_trie.Trie.restore_key (src/marisa_trie.cpp:6299)
  File "marisa_trie.pyx", line 62, in marisa_trie._get_key (src/marisa_trie.cpp:1615)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 10: invalid start byte

I get the same error if the following code is used...

  for k in t.keys():
      print(k)

and again the same error if I use:

  t['someKey']  # or t[u'somekey']

The trie file reads in w/o any error and i've written the file using both trie.save() and trie.write() and in writing file I've used a codec.open() and codec.write() to force utf-8 encoding

I'm not sure if this is similar issue #10

jottos avatar Jan 26 '15 05:01 jottos

ok, never mind. I was taking the examples a little to litterally

so i was loading a BytesTrie() into a constructed Trie() - once I switched to a constructed BytesTrie() it worked fine

jottos avatar Jan 26 '15 06:01 jottos

I'm glad it is not a bug in the marisa-trie source code :) Do you have any suggestions about how to change the docs to make them more clear regarding this?

kmike avatar Jan 26 '15 08:01 kmike

So am I :)

so as for the documentation, at the end of the load/save section, I'd just call out, that the Trie() constructor will not load a RecordTrie or a BytesTrie even though it will not fail. You need to construct the Trie class that you are trying to load.

Alternatively, the load() methods could throw an exception if a trie file of the wrong type is presented.

jottos avatar Jan 29 '15 06:01 jottos

Part of the problem here is that the BytesTrie class should offer a static method for loading. The thought process that I think both jottos and I encountered was:

  • Okay, my trie is saved, now I want to load it
  • Huh, that's weird, the load method requires you to already have a trie
  • I guess I'll create an empty trie first, how do I do that? Oh right, marisa_trie.Trie().

If you could call BytesTrie.load('trie.marisa') as a static method, it would be easier to not go astray.

rspeer avatar Apr 10 '17 18:04 rspeer