numpy
numpy copied to clipboard
Encoding an empty unicode array would produce an array of the wrong dtype
Calling numpy.char.encode on empty unicode array would create a float64 array instead of an array of S dtype.
Reproducing code example:
import numpy
print(numpy.char.encode(numpy.array([], 'U'), 'utf8').dtype)
# This would output:
# float64
I would expect an empty S1 array.
Error message:
The dtype returned seems wrong.
Numpy/Python version information:
>>> import sys, numpy; print(numpy.__version__, sys.version)
1.16.2 3.7.2 (default, Dec 29 2018, 06:19:36)
[GCC 7.3.0]
This is run on a conda environment (I just did a "conda create -n test_numpy python=3.7 numpy"). The problem seems to exist in earlier numpy as well (1.15).
The shape also seems to get messed up. I.e.:
numpy.char.encode(numpy.array([], 'U').reshape((1, 0, 1)), 'utf8').shape)
Prints (1, 0)
instead of the original shape.
Decode is also affected by this bug btw.
The bug is in _to_string_or_unicode_array
, which impacts all of:
-
mod
-
decode
-
encode
-
expandtabs
-
join
-
partition
-
replace
-
rpartition
The fix is probably to work out the correct type ahead of time, rather than guessing from the array contents.
This stackoverflow question is another report of the bug: Why does numpy's np.char.encode
turn an empty unicode array into an empty float64
array?
Here's an older issue that reports the same problem: https://github.com/numpy/numpy/issues/7371