msgpack-numpy icon indicating copy to clipboard operation
msgpack-numpy copied to clipboard

Problem with unpacking packed multi-dimensional array

Open Bomsw opened this issue 1 year ago • 3 comments

I've been trying to use the package to work with some multi-dimensional numpy arrays, unfortunately it does not seem to work

>>> import numpy as np
>>> import msgpack_numpy
>>> packed = msgpack_numpy.packb(np.ones((3,3)))
>>> msgpack_numpy.unpackb(packed)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/dist-packages/msgpack_numpy.py", line 287, in unpackb
    return _unpackb(packed, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/msgpack/fallback.py", line 121, in unpackb
    ret = unpacker._unpack()
  File "/usr/local/lib/python3.9/dist-packages/msgpack/fallback.py", line 602, in _unpack
    ret = self._object_hook(ret)
  File "/usr/local/lib/python3.9/dist-packages/msgpack_numpy.py", line 103, in decode
    return np.ndarray(buffer=obj[b'data'],
TypeError: buffer is too small for requested array

I am using numpy 1.24.4, msgpack-numpy 0.4.8 and msgpack 0.5.6 on python 3.9.5

Bomsw avatar Jul 21 '24 17:07 Bomsw

I can't seem to reproduce this:

❯ pip freeze | egrep '(msgpack-numpy|msgpack-python|numpy)'
msgpack-numpy==0.4.8
msgpack-python==0.5.6
numpy==1.24.4
❯ python --version
Python 3.9.5
❯ python -c 'import msgpack_numpy, numpy; x=numpy.ones((3,3)); s=msgpack_numpy.packb(x); print(msgpack_numpy.unpackb(s))'
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

Can you try reinstalling all of the packages (msgpack-python, numpy, msgpack-numpy) from scratch?

lebedov avatar Jul 22 '24 12:07 lebedov

I'm having similar problems:

        data_to_pack = np.array([[0.5, 0.5],
                                 [0.29983, 0.29983],
                                 ])
        packed_packet = msgpack.packb([38192.1234, data_to_pack], default=msgpack_numpy.encode)
        msgpack.unpackb(packed_packet, object_hook=msgpack_numpy.decode)

Error output:

                       descr = [tuple(tostr(t) if type(t) is bytes else t for t in d) \
                                 for d in obj[b'type']]
                    elif b'kind' in obj and obj[b'kind'] == b'O':
                        return pickle.loads(obj[b'data'])
                    else:
                        descr = obj[b'type']
    
>                   return np.ndarray(buffer=obj[b'data'],
                                      dtype=_unpack_dtype(descr),
                                      shape=obj[b'shape'])
                                     TypeError: buffer is too small for requested array

This is obj as a string: "{b'nd': True, b'type': b'<f8', b'kind': b'', b'shape': [2, 2], b'data': b'\x00\x00\x00\x00\x00\x00\xe0?\x00\x00\x00\x00\x00\x00\xe0?'}" Its weird that the data is mostly zeros.

The output of data_to_pack.tobytes()=b'\x00\x00\x00\x00\x00\x00\xe0?\x00\x00\x00\x00\x00\x00\xe0?\xff\x04\x17+j0\xd3?\xff\x04\x17+j0\xd3?' However, in the code, when (line 560 in msgpack/fallback.py):

        typ, n, obj = self._read_header(execute)

the returned obj=b'\\x00\\x00\\x00\\x00\\x00\\x00\\xe0?\\x00\\x00\\x00\\x00\\x00\\x00\\xe0?' The issues seems to be with lines 359 to 363 in file msgpack/fallback.py:

            n = b & 0b00011111
            typ = TYPE_RAW
            if n > self._max_str_len:
                raise UnpackValueError("%s exceeds max_str_len(%s)", n, self._max_str_len)
            obj = self._read(n)

I don't know enough to figure out what the issue is, but it does seem to do with the result of n = b & 0b00011111, which results in n = 16. It should be 32.

My dependencies:

msgpack==1.1.0
msgpack-numpy==0.4.8
msgpack-python==0.5.6

jinder1s avatar Mar 25 '25 22:03 jinder1s

After a bit more debugging, this seems to be the root of the problem:

https://github.com/lebedov/msgpack-numpy/blob/20c5e5b4730d910ce3b51433598f978bd5dbb12e/msgpack_numpy.py#L30

                return obj.data if obj.flags['C_CONTIGUOUS'] else obj.tobytes()

For this case, obj.flags['C_CONTIGUOUS'] is True, so obj.data is returned. obj.data is a memoryview. The problem is when you as for the len of obj.data: len(obj.data) it returns the number of rows. Which results in n=len(obj.data)*size == 16.

My quick suggestion would be to remove the flags check and always use obj.tobytes(), but I don't know the ramifications of how that would affect everything else.

This is likely only an issue in Windows.

jinder1s avatar Mar 25 '25 22:03 jinder1s