msgpack-python icon indicating copy to clipboard operation
msgpack-python copied to clipboard

Segmentation fault when calling getbuffer() on Packer object

Open thibaudmartinez opened this issue 3 years ago • 9 comments

Hi there,

I'm trying to get the internal data of the Packer object in order to avoid unneeded copying, as documented here.

The code is as follow :

import msgpack

def do_the_job():
  packer = msgpack.Packer(autoreset=False)
  packer.pack(1)
  return packer.getbuffer()

bytes(do_the_job())

When running this snippet, I get the following error :

[1]    9018 segmentation fault (core dumped)  python script.py

I am using Ubuntu 18.04.5 LTS together with msgpack 1.0.2.

Thanks in advance for your help and for your work on this package !

thibaudmartinez avatar Jun 15 '21 10:06 thibaudmartinez

Thank you for report with snippets. But I can not reproduce it.

methane avatar Jun 16 '21 04:06 methane

Thank you for your reply.

I was able to reproduce the bug using the following Debian Docker image.

FROM python:3.9.5-slim-buster

RUN pip install --no-input msgpack==1.0.2

RUN echo "\
import msgpack\n\
\n\
def do_the_job():\n\
    packer = msgpack.Packer(autoreset=False)\n\
    packer.pack(1)\n\
    return packer.getbuffer()\n\
\n\
bytes(do_the_job())\
" > script.py

Can you try to reproduce the bug with the above setup?

docker build . -t test-msgpack
docker run -it test-msgpack bash
python script.py

thibaudmartinez avatar Jun 16 '21 08:06 thibaudmartinez

Thank you, I can reproduce it now! I will investigate it in tomorrow.

methane avatar Jun 16 '21 10:06 methane

I got it. Memoryview object don't have reference to Packer. When do_the_job() return, Packer object is released and Memoryview references "after free" memory.

methane avatar Jun 16 '21 11:06 methane

Great! Do you think it will be easy to fix in an upcoming release?

thibaudmartinez avatar Jun 17 '21 14:06 thibaudmartinez

It's not easy to fix. I can not release the fix anytime soon. You shouldn't use the buffer after the Packer is released. It is unintended usage.

2021年6月17日(木) 23:04 Thibaud Martinez @.***>:

Great! Do you think it will be easy to fix in an upcoming release?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/msgpack/msgpack-python/issues/479#issuecomment-863267580, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABQXKBKC2YKCVDUSYJS4B3TTH6G5ANCNFSM46W57B6Q .

methane avatar Jun 17 '21 14:06 methane

I think this can be fixed relatively easily. Replace PyMemoryView_FromMemory (here) with PyMemoryView_FromBuffer(Py_buffer *view) where view.obj points to self and incref self, though I am not entirely clear on this bit in the docs:

As a special case, for temporary buffers that are wrapped by PyMemoryView_FromBuffer() or PyBuffer_FillInfo() this field is NULL. In general, exporting objects MUST NOT use this scheme.

In particular, PyBuffer_FillInfo() states:

On success, set view->obj to a new reference to exporter and return 0. Otherwise, raise PyExc_BufferError, set view->obj to NULL and return -1;

So keeping a reference to the exporting object is exactly the use case for this field.

jfolz avatar Jun 17 '21 21:06 jfolz

See here. https://github.com/python/cpython/blob/7f01f77f8fabcfd7ddb5d99f12d6fc99af9af384/Objects/memoryobject.c#L775-L776

The best way is implementing the buffer protocol. https://cython.readthedocs.io/en/latest/src/userguide/buffer.html

methane avatar Jun 18 '21 01:06 methane

Oh wow... that's really convoluted. So you need to implement __getbuffer__ and __releasebuffer__ magic methods, and getbuffer can then do memoryview(self).

jfolz avatar Jun 18 '21 06:06 jfolz