cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-121710: Add PyBytesWriter API

Open vstinner opened this issue 1 year ago • 5 comments

  • Issue: gh-121710

📚 Documentation preview 📚: https://cpython-previews--121726.org.readthedocs.build/

vstinner avatar Jul 13 '24 19:07 vstinner

@picnixz: Such coding style review is not helping. It's the first version of the PR, we didn't discuss the API yet. I would prefer comments about the API rather than 7 comments about spaces and PEP 7. It's too early for that.

vstinner avatar Jul 14 '24 08:07 vstinner

Oh ok my bad (I'll try to reduce those kind of reviews in the future). Then, I have a suggestion about maybe having the possibility to expose something for byte substrings (similar to PyUnicodeWriter_WriteSubstring) as well as format strings like PyUnicodeWriter_Format and possibly 1-char bytes. While all those functions are equivalent PyBytesWriter_WriteBytes calls, if I were to ask for a specific one, it would be PyUnicodeWriter_Format which could be very useful for a public API (I don't think the V version is needed though since it's not exposed in the Unicode API).

So, ideally:

  • PyBytesWriter_Format(PyBytesWriter *, char **, const char *, ...)
  • PyBytesWriter_WriteBytes(PyBytesWriter *, char **, PyObject *) which accepts PyObject * arguments
  • PyBytesWriter_WriteBuffer(PyBytesWriter *, char **, const void *, Py_ssize_t) which accepts const void * arguments (the current WriteBytes)
  • PyBytesWriter_WriteChar(PyBytesWriter *, char **, char) -- efficient specialization for size = 1 (maybe not a char, but perhaps Py_UCS*`)
  • PyBytesWriter_WriteSubBytes(PyBytesWriter *, char **, PyObject *, Py_ssize_t, Py_ssize_t) (or any other name) for writing a sub-range of bytes object.

(Again, sorry for my nitpicky review).

picnixz avatar Jul 14 '24 08:07 picnixz

@picnixz: I would prefer to make the API as small as possible. Where do these use cases come from? Did you see usage in the current C code of CPython with the _PyBytesWriter API?

In your list, the most appealing is PyBytesWriter_Format() :-)

PyBytesWriter_Format(PyBytesWriter *, char **, const char *, ...)

While it's tempting to add as many features as PyUnicodeWriter API, I'm not convinced that this function is needed right now.

PyBytesWriter_WriteBytes(PyBytesWriter *, char **, PyObject *) which accepts PyObject * arguments

You can already call PyBytesWriter_WriteBytes(writer, &str, PyBytes_AsString(bytes), PyBytes_Size(bytes)). I never had to "write" a Python bytes object when using the _PyBytesWriter API. I'm not convinced that it's a common use case.

PyBytesWriter_WriteSubBytes(PyBytesWriter *, char **, PyObject *, Py_ssize_t, Py_ssize_t) (or any other name) for writing a sub-range of bytes object.

You can already call PyBytesWriter_WriteBytes(writer, &str, PyBytes_AsString(bytes) + start, PyBytes_Size(bytes) - skip) to write a substring. Again, I'm not convinced that it's common to write Python bytes objects.

PyBytesWriter_WriteChar(PyBytesWriter *, char *, char) -- efficient specialization for size = 1 (maybe not a char, but perhaps Py_UCS`)

PyBytesWriter is not PyUnicodeWriter. The intended usage is to allocate enough bytes and then just use str pointer. Example:

    ascii_data = PyBytesWriter_Prepare(&writer, 10);
    *ascii_data++ = ' ';
    memcpy(ascii_data, "abc", 3); ascii_data += 3;
    *ascii_data++ = '-';
    ...

WriteChar() would be way slower.

vstinner avatar Jul 14 '24 09:07 vstinner

Errr.. my idea was to have a parity with the PyUnicodeWriter API, but if you want to make it as small as possible, I'm fine with it (I never had to use the private API to write a bytes string either and I didn't check the code to see occurrences of it actually). I think the PyBytesWriter_Format(PyBytesWriter *, char **, const char *, ...) would be then the only one I would advocate for.

picnixz avatar Jul 14 '24 10:07 picnixz

Errr.. my idea was to have a parity with the PyUnicodeWriter API

Look at the current usage of _PyBytesWriter. You will see that the code is very different. The hot code doesn't use the writer but just "str" pointer. The writer is just there to manage memory.

PyUnicodeWriter is used for other use cases and so it deserves a different API.

vstinner avatar Jul 14 '24 16:07 vstinner

PR rebased on main to fix a merge conflict.

vstinner avatar Jul 27 '24 19:07 vstinner

I created a C API Working Group issue: https://github.com/capi-workgroup/decisions/issues/39

vstinner avatar Aug 05 '24 13:08 vstinner

Microbenchmark comparing bytearray to PyBytesWriter API:

Result:

$ env/bin/python bench.py
bytearray: Mean +- std dev: 93.0 ns +- 1.4 ns
byteswriter: Mean +- std dev: 23.4 ns +- 0.9 ns

PyBytesWriter is 4x faster (93.0 => 23.4 ns) than using bytearray.

vstinner avatar Aug 07 '24 09:08 vstinner

It seems that all the needed information could fit inside the memory of a PyBytesObject:

  • buffer -> _Py_CAST(PyBytesObject*, writer)->ob_sval
  • allocated -> Py_SIZE(writer)
  • min_size -> unused: the user can be track this separately
  • overallocate - unused (always 1 in this PR)
  • use_small_buffer, small_buffer - unused: small buffer is just an initial overallocation
  • use_bytearray - unused
  • the “current position” is tracked separately in str.

With that we could avoid allocating the big struct. With a very quick and dirty proof of concept, I get reasonable results, which suggest this could be useful for alternative implementations or future CPython versions:

This PR:
byteswriter: Mean +- std dev: 15.4 ns +- 0.6 ns
My proof of concept:
byteswriter: Mean +- std dev: 17.3 ns +- 1.0 ns

For the API, it would mean that PyBytesWriter_Prepare and PyBytesWriter_WriteBytes can reallocate the writer itself, and would take a PyBytesWriter **. That shouldn't hurt with the _PyBytesWriter implementation.

encukou avatar Aug 08 '24 13:08 encukou

min_size -> unused: the user can be track this separately

It's used if you call Prepare() multiple times.

use_small_buffer, small_buffer - unused: small buffer is just an initial overallocation

It's used. It does defer the Python bytes object until Finish() which can avoid having to call the inefficient _PyBytes_Resize() function.

With that we could avoid allocating the big struct.

Why do you want to avoid that? I would like to reuse the existing private _PyBytesWriter code which is well tested. According to benchmarks, allocating PyBytesWriter on the heap memory has no significant cost.

vstinner avatar Aug 09 '24 10:08 vstinner

Why do you want to avoid that?

It would be nice to keep the possibility open for other implementations, if possible.

I would like to reuse the existing private _PyBytesWriter code which is well tested.

But, this is not exposing all of the private interface, and I'm not sure if the existing tests apply to what's exposed here. In particular, the public API doesn't have:

  • disabling overallocation
  • a writable min_size field
  • bytearray support

An alternate implementation would not need the machinery for those features, and could do without min_size and a separate small_buffer.

encukou avatar Aug 21 '24 09:08 encukou

I decided to reject this API for now. It's too low-level and too error prone: https://github.com/capi-workgroup/decisions/issues/39#issuecomment-2396888574

vstinner avatar Oct 07 '24 13:10 vstinner