cpython gh-121710: Add PyBytesWriter API

Issue: gh-121710

📚 Documentation preview 📚: https://cpython-previews--121726.org.readthedocs.build/

Jul 13 '24 19:07 vstinner

@picnixz: Such coding style review is not helping. It's the first version of the PR, we didn't discuss the API yet. I would prefer comments about the API rather than 7 comments about spaces and PEP 7. It's too early for that.

Jul 14 '24 08:07 vstinner

Oh ok my bad (I'll try to reduce those kind of reviews in the future). Then, I have a suggestion about maybe having the possibility to expose something for byte substrings (similar to PyUnicodeWriter_WriteSubstring) as well as format strings like PyUnicodeWriter_Format and possibly 1-char bytes. While all those functions are equivalent PyBytesWriter_WriteBytes calls, if I were to ask for a specific one, it would be PyUnicodeWriter_Format which could be very useful for a public API (I don't think the V version is needed though since it's not exposed in the Unicode API).

So, ideally:

PyBytesWriter_Format(PyBytesWriter *, char **, const char *, ...)
PyBytesWriter_WriteBytes(PyBytesWriter *, char **, PyObject *) which accepts PyObject * arguments
PyBytesWriter_WriteBuffer(PyBytesWriter *, char **, const void *, Py_ssize_t) which accepts const void * arguments (the current WriteBytes)
PyBytesWriter_WriteChar(PyBytesWriter *, char **, char) -- efficient specialization for size = 1 (maybe not a char, but perhaps Py_UCS*`)
PyBytesWriter_WriteSubBytes(PyBytesWriter *, char **, PyObject *, Py_ssize_t, Py_ssize_t) (or any other name) for writing a sub-range of bytes object.

(Again, sorry for my nitpicky review).

Jul 14 '24 08:07 picnixz

@picnixz: I would prefer to make the API as small as possible. Where do these use cases come from? Did you see usage in the current C code of CPython with the _PyBytesWriter API?

In your list, the most appealing is PyBytesWriter_Format() :-)

PyBytesWriter_Format(PyBytesWriter *, char **, const char *, ...)

While it's tempting to add as many features as PyUnicodeWriter API, I'm not convinced that this function is needed right now.

PyBytesWriter_WriteBytes(PyBytesWriter *, char **, PyObject *) which accepts PyObject * arguments

You can already call PyBytesWriter_WriteBytes(writer, &str, PyBytes_AsString(bytes), PyBytes_Size(bytes)). I never had to "write" a Python bytes object when using the _PyBytesWriter API. I'm not convinced that it's a common use case.

PyBytesWriter_WriteSubBytes(PyBytesWriter *, char **, PyObject *, Py_ssize_t, Py_ssize_t) (or any other name) for writing a sub-range of bytes object.

You can already call PyBytesWriter_WriteBytes(writer, &str, PyBytes_AsString(bytes) + start, PyBytes_Size(bytes) - skip) to write a substring. Again, I'm not convinced that it's common to write Python bytes objects.

PyBytesWriter_WriteChar(PyBytesWriter *, char *, char) -- efficient specialization for size = 1 (maybe not a char, but perhaps Py_UCS`)

PyBytesWriter is not PyUnicodeWriter. The intended usage is to allocate enough bytes and then just use str pointer. Example:

    ascii_data = PyBytesWriter_Prepare(&writer, 10);
    *ascii_data++ = ' ';
    memcpy(ascii_data, "abc", 3); ascii_data += 3;
    *ascii_data++ = '-';
    ...

WriteChar() would be way slower.

Jul 14 '24 09:07 vstinner

Errr.. my idea was to have a parity with the PyUnicodeWriter API, but if you want to make it as small as possible, I'm fine with it (I never had to use the private API to write a bytes string either and I didn't check the code to see occurrences of it actually). I think the PyBytesWriter_Format(PyBytesWriter *, char **, const char *, ...) would be then the only one I would advocate for.

Jul 14 '24 10:07 picnixz

Errr.. my idea was to have a parity with the PyUnicodeWriter API

Look at the current usage of _PyBytesWriter. You will see that the code is very different. The hot code doesn't use the writer but just "str" pointer. The writer is just there to manage memory.

PyUnicodeWriter is used for other use cases and so it deserves a different API.

Jul 14 '24 16:07 vstinner

PR rebased on main to fix a merge conflict.

Jul 27 '24 19:07 vstinner

I created a C API Working Group issue: https://github.com/capi-workgroup/decisions/issues/39

Aug 05 '24 13:08 vstinner

Microbenchmark comparing bytearray to PyBytesWriter API:

Result:

$ env/bin/python bench.py
bytearray: Mean +- std dev: 93.0 ns +- 1.4 ns
byteswriter: Mean +- std dev: 23.4 ns +- 0.9 ns

PyBytesWriter is 4x faster (93.0 => 23.4 ns) than using bytearray.

Aug 07 '24 09:08 vstinner

It seems that all the needed information could fit inside the memory of a PyBytesObject:

buffer -> _Py_CAST(PyBytesObject*, writer)->ob_sval
allocated -> Py_SIZE(writer)
min_size -> unused: the user can be track this separately
overallocate - unused (always 1 in this PR)
use_small_buffer, small_buffer - unused: small buffer is just an initial overallocation
use_bytearray - unused
the “current position” is tracked separately in str.

With that we could avoid allocating the big struct. With a very quick and dirty proof of concept, I get reasonable results, which suggest this could be useful for alternative implementations or future CPython versions:

This PR:
byteswriter: Mean +- std dev: 15.4 ns +- 0.6 ns
My proof of concept:
byteswriter: Mean +- std dev: 17.3 ns +- 1.0 ns

For the API, it would mean that PyBytesWriter_Prepare and PyBytesWriter_WriteBytes can reallocate the writer itself, and would take a PyBytesWriter **. That shouldn't hurt with the _PyBytesWriter implementation.

Aug 08 '24 13:08 encukou

min_size -> unused: the user can be track this separately

It's used if you call Prepare() multiple times.

use_small_buffer, small_buffer - unused: small buffer is just an initial overallocation

It's used. It does defer the Python bytes object until Finish() which can avoid having to call the inefficient _PyBytes_Resize() function.

With that we could avoid allocating the big struct.

Why do you want to avoid that? I would like to reuse the existing private _PyBytesWriter code which is well tested. According to benchmarks, allocating PyBytesWriter on the heap memory has no significant cost.

Aug 09 '24 10:08 vstinner

Why do you want to avoid that?

It would be nice to keep the possibility open for other implementations, if possible.

I would like to reuse the existing private _PyBytesWriter code which is well tested.

But, this is not exposing all of the private interface, and I'm not sure if the existing tests apply to what's exposed here. In particular, the public API doesn't have:

disabling overallocation
a writable min_size field
bytearray support

An alternate implementation would not need the machinery for those features, and could do without min_size and a separate small_buffer.

Aug 21 '24 09:08 encukou

I decided to reject this API for now. It's too low-level and too error prone: https://github.com/capi-workgroup/decisions/issues/39#issuecomment-2396888574

Oct 07 '24 13:10 vstinner