gh-121710: Add PyBytesWriter API
- Issue: gh-121710
📚 Documentation preview 📚: https://cpython-previews--121726.org.readthedocs.build/
@picnixz: Such coding style review is not helping. It's the first version of the PR, we didn't discuss the API yet. I would prefer comments about the API rather than 7 comments about spaces and PEP 7. It's too early for that.
Oh ok my bad (I'll try to reduce those kind of reviews in the future). Then, I have a suggestion about maybe having the possibility to expose something for byte substrings (similar to PyUnicodeWriter_WriteSubstring) as well as format strings like PyUnicodeWriter_Format and possibly 1-char bytes. While all those functions are equivalent PyBytesWriter_WriteBytes calls, if I were to ask for a specific one, it would be PyUnicodeWriter_Format which could be very useful for a public API (I don't think the V version is needed though since it's not exposed in the Unicode API).
So, ideally:
PyBytesWriter_Format(PyBytesWriter *, char **, const char *, ...)PyBytesWriter_WriteBytes(PyBytesWriter *, char **, PyObject *)which acceptsPyObject *argumentsPyBytesWriter_WriteBuffer(PyBytesWriter *, char **, const void *, Py_ssize_t)which acceptsconst void *arguments (the current WriteBytes)PyBytesWriter_WriteChar(PyBytesWriter *, char **, char)-- efficient specialization for size = 1 (maybe not a char, but perhaps Py_UCS*`)PyBytesWriter_WriteSubBytes(PyBytesWriter *, char **, PyObject *, Py_ssize_t, Py_ssize_t)(or any other name) for writing a sub-range of bytes object.
(Again, sorry for my nitpicky review).
@picnixz: I would prefer to make the API as small as possible. Where do these use cases come from? Did you see usage in the current C code of CPython with the _PyBytesWriter API?
In your list, the most appealing is PyBytesWriter_Format() :-)
PyBytesWriter_Format(PyBytesWriter *, char **, const char *, ...)
While it's tempting to add as many features as PyUnicodeWriter API, I'm not convinced that this function is needed right now.
PyBytesWriter_WriteBytes(PyBytesWriter *, char **, PyObject *) which accepts PyObject * arguments
You can already call PyBytesWriter_WriteBytes(writer, &str, PyBytes_AsString(bytes), PyBytes_Size(bytes)). I never had to "write" a Python bytes object when using the _PyBytesWriter API. I'm not convinced that it's a common use case.
PyBytesWriter_WriteSubBytes(PyBytesWriter *, char **, PyObject *, Py_ssize_t, Py_ssize_t) (or any other name) for writing a sub-range of bytes object.
You can already call PyBytesWriter_WriteBytes(writer, &str, PyBytes_AsString(bytes) + start, PyBytes_Size(bytes) - skip) to write a substring. Again, I'm not convinced that it's common to write Python bytes objects.
PyBytesWriter_WriteChar(PyBytesWriter *, char *, char) -- efficient specialization for size = 1 (maybe not a char, but perhaps Py_UCS`)
PyBytesWriter is not PyUnicodeWriter. The intended usage is to allocate enough bytes and then just use str pointer. Example:
ascii_data = PyBytesWriter_Prepare(&writer, 10);
*ascii_data++ = ' ';
memcpy(ascii_data, "abc", 3); ascii_data += 3;
*ascii_data++ = '-';
...
WriteChar() would be way slower.
Errr.. my idea was to have a parity with the PyUnicodeWriter API, but if you want to make it as small as possible, I'm fine with it (I never had to use the private API to write a bytes string either and I didn't check the code to see occurrences of it actually). I think the PyBytesWriter_Format(PyBytesWriter *, char **, const char *, ...) would be then the only one I would advocate for.
Errr.. my idea was to have a parity with the PyUnicodeWriter API
Look at the current usage of _PyBytesWriter. You will see that the code is very different. The hot code doesn't use the writer but just "str" pointer. The writer is just there to manage memory.
PyUnicodeWriter is used for other use cases and so it deserves a different API.
PR rebased on main to fix a merge conflict.
I created a C API Working Group issue: https://github.com/capi-workgroup/decisions/issues/39
Microbenchmark comparing bytearray to PyBytesWriter API:
Result:
$ env/bin/python bench.py
bytearray: Mean +- std dev: 93.0 ns +- 1.4 ns
byteswriter: Mean +- std dev: 23.4 ns +- 0.9 ns
PyBytesWriter is 4x faster (93.0 => 23.4 ns) than using bytearray.
It seems that all the needed information could fit inside the memory of a PyBytesObject:
buffer->_Py_CAST(PyBytesObject*, writer)->ob_svalallocated->Py_SIZE(writer)min_size-> unused: the user can be track this separatelyoverallocate- unused (always 1 in this PR)use_small_buffer,small_buffer- unused: small buffer is just an initial overallocationuse_bytearray- unused- the “current position” is tracked separately in
str.
With that we could avoid allocating the big struct. With a very quick and dirty proof of concept, I get reasonable results, which suggest this could be useful for alternative implementations or future CPython versions:
This PR:
byteswriter: Mean +- std dev: 15.4 ns +- 0.6 ns
My proof of concept:
byteswriter: Mean +- std dev: 17.3 ns +- 1.0 ns
For the API, it would mean that PyBytesWriter_Prepare and PyBytesWriter_WriteBytes can reallocate the writer itself, and would take a PyBytesWriter **. That shouldn't hurt with the _PyBytesWriter implementation.
min_size -> unused: the user can be track this separately
It's used if you call Prepare() multiple times.
use_small_buffer, small_buffer - unused: small buffer is just an initial overallocation
It's used. It does defer the Python bytes object until Finish() which can avoid having to call the inefficient _PyBytes_Resize() function.
With that we could avoid allocating the big struct.
Why do you want to avoid that? I would like to reuse the existing private _PyBytesWriter code which is well tested. According to benchmarks, allocating PyBytesWriter on the heap memory has no significant cost.
Why do you want to avoid that?
It would be nice to keep the possibility open for other implementations, if possible.
I would like to reuse the existing private
_PyBytesWritercode which is well tested.
But, this is not exposing all of the private interface, and I'm not sure if the existing tests apply to what's exposed here. In particular, the public API doesn't have:
- disabling overallocation
- a writable
min_sizefield bytearraysupport
An alternate implementation would not need the machinery for those features, and could do without min_size and a separate small_buffer.
I decided to reject this API for now. It's too low-level and too error prone: https://github.com/capi-workgroup/decisions/issues/39#issuecomment-2396888574