bcolz icon indicating copy to clipboard operation
bcolz copied to clipboard

fromdataframe and cyrillic string

Open Winand opened this issue 9 years ago • 1 comments

pandas 0.17.1, bcolz 0.12.1 fromdataframe fails on not-latin1 strings.

>>> bcolz.ctable.fromdataframe(pd.DataFrame(["Now fail: Привет"]))
Traceback (most recent call last):
  File "D:\andray\Software\WinPython-32bit-3.4.3.7\python-3.4.3\lib\site-packages\bcolz\utils.py", line 121, in to_ndarray
    array = np.array(array, dtype=dtype.base)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-15: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\andray\Software\WinPython-32bit-3.4.3.7\python-3.4.3\lib\site-packages\bcolz\ctable.py", line 661, in fromdataframe
    col = bcolz.carray(vals, dtype='S%d' %
  File "bcolz/carray_ext.pyx", line 1022, in bcolz.carray_ext.carray.__cinit__ (bcolz\carray_ext.c:13692)
  File "bcolz/carray_ext.pyx", line 1091, in bcolz.carray_ext.carray._create_carray (bcolz\carray_ext.c:14452)
  File "D:\andray\Software\WinPython-32bit-3.4.3.7\python-3.4.3\lib\site-packages\bcolz\utils.py", line 123, in to_ndarray
    raise ValueError("cannot convert to an ndarray object")
ValueError: cannot convert to an ndarray object

Winand avatar Jan 11 '16 09:01 Winand

It looks like the importing machinery assumes a plain string, not a Unicode one. This is a bug indeed. You are welcome to provide a pull request to accelerate fixing this.

FrancescAlted avatar Jan 11 '16 09:01 FrancescAlted