deepdish
deepdish copied to clipboard
UnicodeDecodeError when using Python 2.7
Hello,
I'm using Deepdish to save a dictionary that contains unicode strings as key and numpy arrays, corresponding to computed embeddings. Small example to reproduce the exception:
import deepdish as dd
import numpy as np
d = {'foo': np.ones((10, 20)),'sub': {'bar': 'a string','é': 1.23,},}
dd.io.save('test.h5', d)
And the raised exception is:
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/path.py:112: NaturalNameWarning: object name is not a valid Python identifier: '\xc3\xa9'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
NaturalNameWarning)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/deepdish/io/hdf5io.py", line 584, in save
filters=filters, idtable=idtable)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/deepdish/io/hdf5io.py", line 212, in _save_level
idtable=idtable)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/deepdish/io/hdf5io.py", line 297, in _save_level
setattr(group._v_attrs, name, level)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/attributeset.py", line 481, in __setattr__
self._g__setattr(name, value)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/attributeset.py", line 423, in _g__setattr
self._g_setattr(self._v_node, name, stvalue)
File "tables/hdf5extension.pyx", line 658, in tables.hdf5extension.AttributeSet._g_setattr (tables/hdf5extension.c:7458)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
Any idea how to overcome this problem?
Thanks!
I have the same problem. You have to encode every unicode key before saving: u'é'.encode('utf-8')
It is not working. Here what I did:
import deepdish as dd
import numpy as np
d = {'foo': np.ones((10, 20)),'sub': {'bar': 'a string',u'é'.encode("utf-8"): 1.23,},}
dd.io.save('test.h5', d)
And I get the same exception
Thank you so much for finding this issue! I don't use Python 2 much, so I am happy that this has been identified.
So far, I have detected two issues with unicode under Python 2. One was my bug that meant you could not save using unicode group names. This has just been fixed so you can just do pip install -U deepdish
.
However, the other problem is reading files with unicode group names, and this seems to be an issue with PyTables, which is our HDF5 backend. I have filed and issue (https://github.com/PyTables/PyTables/issues/652) so let's see what they say.
All of this seems to be working fine under Python 3, so that is currently a work-around.
Thanks a lot for jumping quickly into this issue @gustavla I will closely follow the issue on PyTables. I'm using Python 3 for now as a work around.