deepdish icon indicating copy to clipboard operation
deepdish copied to clipboard

UnicodeDecodeError when using Python 2.7

Open jplu opened this issue 6 years ago • 4 comments

Hello,

I'm using Deepdish to save a dictionary that contains unicode strings as key and numpy arrays, corresponding to computed embeddings. Small example to reproduce the exception:

import deepdish as dd
import numpy as np
d = {'foo': np.ones((10, 20)),'sub': {'bar': 'a string','é': 1.23,},}
dd.io.save('test.h5', d)

And the raised exception is:

/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/path.py:112: NaturalNameWarning: object name is not a valid Python identifier: '\xc3\xa9'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/deepdish/io/hdf5io.py", line 584, in save
    filters=filters, idtable=idtable)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/deepdish/io/hdf5io.py", line 212, in _save_level
    idtable=idtable)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/deepdish/io/hdf5io.py", line 297, in _save_level
    setattr(group._v_attrs, name, level)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/attributeset.py", line 481, in __setattr__
    self._g__setattr(name, value)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/attributeset.py", line 423, in _g__setattr
    self._g_setattr(self._v_node, name, stvalue)
  File "tables/hdf5extension.pyx", line 658, in tables.hdf5extension.AttributeSet._g_setattr (tables/hdf5extension.c:7458)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Any idea how to overcome this problem?

Thanks!

jplu avatar Oct 19 '17 16:10 jplu

I have the same problem. You have to encode every unicode key before saving: u'é'.encode('utf-8')

asanakoy avatar Oct 19 '17 16:10 asanakoy

It is not working. Here what I did:

import deepdish as dd
import numpy as np
d = {'foo': np.ones((10, 20)),'sub': {'bar': 'a string',u'é'.encode("utf-8"): 1.23,},}
dd.io.save('test.h5', d)

And I get the same exception

jplu avatar Oct 19 '17 16:10 jplu

Thank you so much for finding this issue! I don't use Python 2 much, so I am happy that this has been identified.

So far, I have detected two issues with unicode under Python 2. One was my bug that meant you could not save using unicode group names. This has just been fixed so you can just do pip install -U deepdish.

However, the other problem is reading files with unicode group names, and this seems to be an issue with PyTables, which is our HDF5 backend. I have filed and issue (https://github.com/PyTables/PyTables/issues/652) so let's see what they say.

All of this seems to be working fine under Python 3, so that is currently a work-around.

gustavla avatar Oct 21 '17 20:10 gustavla

Thanks a lot for jumping quickly into this issue @gustavla I will closely follow the issue on PyTables. I'm using Python 3 for now as a work around.

jplu avatar Oct 23 '17 08:10 jplu