pycapnp icon indicating copy to clipboard operation
pycapnp copied to clipboard

Assign data fields from bytearray

Open olofk opened this issue 8 years ago • 3 comments

I have a piece of proprietary code with a python2 wrapper that gives me data as a python bytearray, but it seems like I need to cast this to bytes in order to put it in a pycapnp data field. As this a quite large buffer, it turns out that the conversion to bytes affect the performance quite a bit. I can't find a way to convert this to bytes without an extra copy, so I wonder if it would be possible for pycapnp to directly support initializing data fields from bytearray. Or is this just moving the problem into pycapnp instead? :)

olofk avatar May 29 '17 14:05 olofk

The same problem arises when trying to write to a Data field with a Python memoryview. It would be feasible to get access to the underlying pointer and copy directly from there without requiring the intermediate copy to a bytes object.

tbenthompson avatar Feb 28 '18 04:02 tbenthompson

@tbenthompson Python has a pretty comprehensive C-based API, the buffer protocol, for doing exactly this – PEP #3118 describes the most recent iteration.

I have used it myself for implementing shared-reference, close-to-zero-copy data transformations in Python C++ extensions. I used abstract struct bases called byte_source and byte_sink to wrap the Py_buffer struct – the KJ library has its own (much better) implementations that could be readily extended for the task.

One doesn’t need to implement everything as a C++ extension module to take advantage of this: the structures that PEP 3118 describes have been addressed quite well by the Cython project, whose standard library allows for ready access to the C-level memory-description API – but without having to count references or muck around with PyArg_ParseTuple(…).

Cython itself builds on the PEP 3118 stuff – one of the most excellent APIs Cython offers is something they call “typed memoryviews” – which seamlessly unites the I/O operations for C-style arrays, NumPy ndarray derivatives, Python’s built-in array types and sequences, and anything PEP 3118 covers.

If anyone is at all interested, I also wrote a PEP-3118-compliant parser for the struct module typecodes used to describe Py_buffer memory shapes. It uses C++14 – here’s the header and the source; anyone who has worked with NumPy’s dtype system will feel comfortable dealing with the Python structcode DSL.

Anyway – this was probably too much blather on the subject, but it’s something I am familiar with, and I enjoy working with it. Let me know if I can help with this.

(N.B. the source of mine that I linked was originally written for a Python 2.7-based project – but that shouldn’t make a difference in most PEP 3118-related cases.)

fish2000 avatar Feb 26 '20 02:02 fish2000

Note that once you figure out how to tunnel this through Python/Cython, you'll also want to know how to add an existing byte array to a capnp message in C++, without copying. The trick here is to use Orphanage::referenceExternalData() (messageBuilder.getOrphanage().referenceExternalData()).

kentonv avatar Feb 28 '20 19:02 kentonv