nixpy icon indicating copy to clipboard operation
nixpy copied to clipboard

UnicodeDecodeError when setting tag units with unit of data array

Open ajkswamy opened this issue 7 years ago • 7 comments

Hi

Here is an example with nixio v1.3.0

import nixio as nix
import numpy as np

nixFile = nix.File.open('test.h5')
blk = nixFile.create_block('TestBlock', 'Test')
da = blk.create_data_array('TestDA', 'Test', data=np.random.rand(50))
da.unit = 'mV'
dim = da.append_sampled_dimension(1)
dim.unit = 's'

tag = blk.create_tag('TestTag', 'Test', position=[10])
tag.extent = [10]
tag.references.append(da)
tag.units = [da.dimensions[0].unit]
nixFile.close()

Here is the Traceback,

  Traceback (most recent call last):
  File "tmp/nixioUnicodeBug.py", line 14, in <module>
    tag.units = [da.dimensions[0].unit]
  File "/home/aj/intel/intelpython27/envs/GJEMS/lib/python2.7/site-packages/nixio/pycore/tag.py", line 51, in units
    u = util.units.sanitizer(u)
  File "/home/aj/intel/intelpython27/envs/GJEMS/lib/python2.7/site-packages/nixio/pycore/util/units.py", line 66, in sanitizer
    replace(micro, "u").replace(mugr, "u")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

I could get around it with

tag.units = [str(da.dimensions[0].unit)]

Is this expected behavior? Thanks.

ajkswamy avatar Mar 03 '17 16:03 ajkswamy

This error occurs generally when given data does not follow proper utf-8 encoding, so you can take a look that data you are providing contains proper utf-8 charset. I think your data probably contains back quotes.

s0nskar avatar Mar 03 '17 17:03 s0nskar

Sorry for not being clear, but in the example above, I create a completely new file 'test.h5' and create a new block, a new data array and a new tag. The error is reproducible by just running the above code.

ajkswamy avatar Mar 03 '17 17:03 ajkswamy

Hey Ajay.

I think I know what's going on here. This line

    replace(micro, "u").replace(mugr, "u")

is meant to replace the character μ in a unit string with u. It does two replacements since there are two different μ codepoints. One meant to be used as an SI prefix (micro) and the Greek lowercase m (mugr).

This should work on both Python 2 and 3. Which OS are you running (Ubuntu 14.04, if I remember correctly?). I'll see if I can pinpoint the exact issue.

achilleas-k avatar Mar 04 '17 11:03 achilleas-k

Hi Achilleas

It seems to be unicode conversion issue as dim.unit returns a unicode str which is not accepted by tag.units. It seems to accept normal str though

In [5]: dim.unit        
Out[5]: u's'

In [6]: tag.units = [da.dimensions[0].unit]
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-6-2943b086a69a> in <module>()
----> 1 tag.units = [da.dimensions[0].unit]

/home/aj/intel/intelpython27/lib/python2.7/site-packages/nixio/pycore/tag.pyc in units(self, units)
     49             for u in units:
     50                 util.check_attr_type(u, str)
---> 51                 u = util.units.sanitizer(u)
     52                 if not (util.units.is_si(u) or util.units.is_compound(u)):
     53                     raise InvalidUnit(

/home/aj/intel/intelpython27/lib/python2.7/site-packages/nixio/pycore/util/units.pyc in sanitizer(unit)
     64     mugr = "μ"
     65     return unit.replace(" ", "").replace("mu", "u").\
---> 66         replace(micro, "u").replace(mugr, "u")
     67 
     68 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

In [7]: tag.units =[str(da.dimensions[0].unit)]

ajkswamy avatar Mar 06 '17 09:03 ajkswamy

Cool. Thanks for the extra info. I'll try to poke this bug to death later today... or tomorrow. Or soon. Most likely soonish.

achilleas-k avatar Mar 06 '17 11:03 achilleas-k

Hey @achilleas-k, i was trying to run tag = blk.create_tag('TestTag', 'Test', position=[10]) from above code but it's giving me this error.

ArgumentError                             Traceback (most recent call last)
<ipython-input-10-9cbdb1fe1850> in <module>()
----> 1 tag = blk.create_tag('TestTag', 'Test', position=[10])

ArgumentError: Python argument types in
    Block.create_tag(Block, str, str)
did not match C++ signature:
    create_tag(nix::Block {lvalue}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<double, std::allocator<double> >)

s0nskar avatar Mar 06 '17 11:03 s0nskar

Hello @s0nskar.

That's an issue that arises when using the C++ bindings (backend="hdf5") instead of the pure Python backend (backend="h5py"). The problem here is that some functions simply call the equivalent C++ function in the backend directly, while others have a Python layer that does some preprocessing of the function arguments. It should work if you don't use the position keyword argument.

On the one hand, this could count as an API incompatibility between the two backends, which might be an issue. On the other hand, it's just the way Python works. It has keyword arguments. I guess we could have a preprocessing for keyword arguments for ALL methods before calling the backend, but that would be some amount of work for little benefit.

achilleas-k avatar Mar 06 '17 15:03 achilleas-k