cpython icon indicating copy to clipboard operation
cpython copied to clipboard

[C API] Add PyBytesWriter API

Open vstinner opened this issue 1 year ago • 2 comments

Feature or enhancement

I propose adding a new PyBytesWriter API to create a Python bytes object.

API:

typedef struct PyBytesWriter PyBytesWriter;

// Create a bytes writer instance.
// On success, set *str and return a new writer.
// On error, set an exception and return NULLL.
PyAPI_FUNC(PyBytesWriter*) PyBytesWriter_Create(
    Py_ssize_t size,
    char **str);

// Return the final Python bytes object and destroy the writer instance.
// On success, return a bytes object.
// On error, set an exception and return NULL.
PyAPI_FUNC(PyObject *) PyBytesWriter_Finish(
    PyBytesWriter *writer,
    char *str);

// Disard a writer: deallocate its memory.
PyAPI_FUNC(void) PyBytesWriter_Discard(PyBytesWriter *writer);

// Allocate 'size' bytes to prepare writing into writer.
// On success, return 0.
// On error, set an exception and return -1.
PyAPI_FUNC(int) PyBytesWriter_Prepare(
    PyBytesWriter *writer,
    char **str,
    Py_ssize_t size);

// Write a bytes string into writer.
// On success, return 0.
// On error, set an exception and return -1.
PyAPI_FUNC(int) PyBytesWriter_WriteBytes(
    PyBytesWriter *writer,
    char **str,
    const void *bytes,
    Py_ssize_t size);

The PyBytesWriter writer is responsible to manage memor and uses overallocation to be more efficient, str is a cursor to write into the bytes string.

The implementation is based on the existing private _PyBytesWriter API which exists for many years and is used by many functions such as ASCII and Latin1 encoders, binascii module, _pickle module, _struct module, bytes methods, etc.

Linked PRs

  • gh-121726

vstinner avatar Jul 13 '24 14:07 vstinner

Example creating the bytes string b"abcdef" with PyBytesWriter_WriteBytes():

static PyObject* create_string(void)
{
    char *str;
    PyBytesWriter *writer = PyBytesWriter_Create(6, &str);
    if (writer == NULL) {
        return NULL;
    }

    if (PyBytesWriter_WriteBytes(writer, &str, "abc", 3) < 0) {
        goto error;
    }
    if (PyBytesWriter_WriteBytes(writer, &str, "def", 3) < 0) {
        goto error;
    }

    return PyBytesWriter_Finish(writer, str);

error:
    PyBytesWriter_Discard(writer);
    return NULL;
}

Example creating the bytes string b"Hello." using PyBytesWriter_Prepare():

static PyObject *
test_byteswriter_prepare(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args))
{
    char *str;
    PyBytesWriter *writer = PyBytesWriter_Create(0, &str);
    if (writer == NULL) {
        return NULL;
    }

    if (PyBytesWriter_Prepare(writer, &str, 6) < 0) {
        PyBytesWriter_Discard(writer);
        return NULL;
    }

    memcpy(str, "Hello", 5);
    str += 5;

    *str++ = '.';

    return PyBytesWriter_Finish(writer, str);
}

vstinner avatar Jul 13 '24 14:07 vstinner

The implementation is based on the existing private _PyBytesWriter API which exists for many years

The proposed public API is different than the private API: it allows to still use the writer when an error occurs, whereas the private API requires the function to discard the writer instance. The public API is also simpler to be less error-prone.

vstinner avatar Jul 13 '24 19:07 vstinner

I decided to reject this API for now. It's too low-level and too error prone: https://github.com/capi-workgroup/decisions/issues/39#issuecomment-2396888574

vstinner avatar Oct 07 '24 13:10 vstinner