cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-132108: Add Buffer Protocol support to int.from_bytes to improve performance

Open cmaloney opened this issue 9 months ago • 18 comments

Speed up conversion from bytes-like objects like bytearray while keeping conversion from bytes stable.

On a --with-lto --enable-optimizaitons build on my 64 bit Linux box:

new:

from_bytes_flags: Mean +- std dev: 28.6 ns +- 0.5 ns
bench_convert[bytes]: Mean +- std dev: 50.4 ns +- 1.4 ns
bench_convert[bytearray]: Mean +- std dev: 51.3 ns +- 0.7 ns

old:

from_bytes_flags: Mean +- std dev: 28.1 ns +- 1.1 ns
bench_convert[bytes]: Mean +- std dev: 50.3 ns +- 4.3 ns
bench_convert[bytearray]: Mean +- std dev: 64.7 ns +- 0.9 ns

Benchmark code:

import pyperf
import time

def from_bytes_flags(loops):
    range_it = range(loops)

    t0 = time.perf_counter()
    for _ in range_it:
        int.from_bytes(b'\x00\x10', byteorder='big')
        int.from_bytes(b'\x00\x10', byteorder='little')
        int.from_bytes(b'\xfc\x00', byteorder='big', signed=True)
        int.from_bytes(b'\xfc\x00', byteorder='big', signed=False)
        int.from_bytes([255, 0, 0], byteorder='big')
    return time.perf_counter() - t0

sample_bytes = [
    b'',
    b'\x00',
    b'\x01',
    b'\x7f',
    b'\x80',
    b'\xff',
    b'\x01\x00',
    b'\x7f\xff',
    b'\x80\x00',
    b'\xff\xff',
    b'\x01\x00\x00',
]

sample_bytearray = [bytearray(v) for v in sample_bytes]

def bench_convert(loops, values):
    range_it = range(loops)

    t0 = time.perf_counter()
    for _ in range_it:
        for val in values:
            int.from_bytes(val)
    return time.perf_counter() - t0

runner = pyperf.Runner()

runner.bench_time_func('from_bytes_flags', from_bytes_flags, inner_loops=10)
runner.bench_time_func('bench_convert[bytes]', bench_convert, sample_bytes, inner_loops=10)
runner.bench_time_func('bench_convert[bytearray]', bench_convert, sample_bytearray, inner_loops=10)
  • Issue: gh-132108

cmaloney avatar Apr 05 '25 03:04 cmaloney

Small question but how do we cope with classes that explicitly define .__bytes__() and are buffer-like? like custom bytes objects? (this is an edge-case but still, it can be a breaking change).

Note that PyObject_Bytes first call __bytes__, then call PyBytes_FromObject if there is no __bytes__ and only then are buffer-like objects considered, but not before. So __bytes__ has a higher priority than buffer-like interface.

Instead, we should restrict ourselves to exact buffer objects, namely exact bytes and bytearray objects.

picnixz avatar Apr 05 '25 10:04 picnixz

Cases including classes which implement __bytes__() that return both valid (ex. bytes) and non-valid (ex. str) values are tested in test_long, test_from_bytes so I don't think any critical behavior changes there.

As you point out, if code returns a different set of machine bytes when exporting buffer protocol vs __bytes__(), this will change behavior. __bytes__() will not be run, instead just the buffer export will be called. That same issue will come up in PyObject_Bytes vs. PyBytes_FromObject calls as PyObject_Bytes checks __bytes__() first while PyBytes_FromObject does buffer protocol first and never checks __bytes__(). Code here uses PyObject_Bytes(). I don't think CPython strongly uses one or the other as "more correct".

Could match existing behavior by always checking for a __bytes__ member and !PyBytes_CheckExact() (avoid __bytes__() call for bytes as it changes performance and wasn't present before). To me that isn't as good of an implementation. It is slower (more branches), more complex code, and I prefer encouraging buffer protocol for best performance.

Could restrict to known CPython types (bytes, bytearray, array, memoryview), but that lowers the usefulness to me as systems which implement buffer and __bytes__ for efficiency can't use the newer and potentially more efficient buffer protocol here. It also requires more condition / type checks than PyObject_CheckBuffer.


Walking through common types passed to int.from_bytes() more explicitly:

  1. exact bytes, the new code will get the data using a Py_buffer rather than increment the ref to the bytes (PyBytes_CheckExact case). Perf test shows performance is stable for that.
  2. "bytes-like" objects (subclasses of bytes, bytearray, memoryview, array) used the buffer protocol to copy before, use now. Less calls/branches/checks getting to exporting the buffer. Removes a copy of that buffer into a PyBytes. Perf test shows faster for bytearray, likely is for other cases as well.
  3. list, tuple, iterable (other than str): PyObject_CheckBuffer will fail for so code will call PyObject_Bytes which will call PyBytes_FromObject to handle, same as before.
  4. str: Doesn't export bytes. That fails / raises a TypeError In test_long, test_from_bytes validates that behavior. Behavior is unchanged.
  5. Objects that implement __bytes__() but don't support buffer protocol: Tested in test_long test_from_bytes (ValidBytes, InvalidBytes, RaisingBytes). These behave as before. PyObject_CheckBuffer will fail for so code will call PyObject_Bytes which will call PyBytes_FromObject to handle, same as before.
  6. Objects that implement __bytes__() and support buffer protocol: The __bytes__() function will no longer be called; If it broke the API contract by returning str for instance code will now run using its buffer protocol to get the underlying machine bytes instead of throwing an exception.

cmaloney avatar Apr 05 '25 18:04 cmaloney

This is a breaking change. Example.

Docs says:

Called by bytes to compute a byte-string representation of an object. This should return a bytes object. The object class itself does not provide this method.

If b'X' is a byte-string representation of b'a' - you are, probably, correct. Otherwise it's just an example, that you could break something, by overriding dunder methods in subclasses. Say,

>>> class int2(int):
...     def __float__(self):
...         return 3.14
...         
>>> float(int2(123))
3.14

skirpichev avatar Apr 06 '25 13:04 skirpichev

This is an example that the method resolution order changes. It now ignores custom __bytes__ method, I don't think that changing __bytes__ method on bytes subclass is an artificial example. I am pretty sure that people use that in the wild.

The reverse logic is true: PR's author must prove that it does not break things.

sobolevn avatar Apr 06 '25 13:04 sobolevn

It now ignores custom __bytes__ method

I would say that if you expose something different via buffer protocol and __bytes__ dunder - it's your fault, isn't? (Though, this constraint isn't documented explicitly.) Just as we, probably, could assume that float(int2(123)) = float(int(int2(123))).

skirpichev avatar Apr 06 '25 13:04 skirpichev

All commit authors signed the Contributor License Agreement.

CLA signed

python-cla-bot[bot] avatar Apr 06 '25 13:04 python-cla-bot[bot]

I'll see if I can make a largely performance neutral version that checks __bytes__ before using buffer protocol. The potential disconnect between __bytes__() and __buffer__() concerns me, feels like a source of easy to code hard to detect until they show up somewhere that's a problem bugs... Wondering if there's an efficient way to say something like "If __bytes__() is set, __buffer__() should be cleared (or defaulted to memoryview(__bytes__())?

>>> class distinct_bytes_buffer(bytes):
...     def __bytes__(self):
...         return b'b'
... 
...     def __buffer__(self, flags):
...         return memoryview(b'c')
... 
... 
... class same_bytes_buffer(bytes):
...     def __bytes__(self):
...         return b'b'
... 
...     def __buffer__(self, flags):
...         return memoryview(b'b')
... 
>>> int.from_bytes(distinct_bytes_buffer(b'a'))
... 
99
>>> int.from_bytes(same_bytes_buffer(b'a'))
... 
98
>>> int.from_bytes(b'a')
... 
97
>>> int.from_bytes(b'b')
... 
98
>>> int.from_bytes(b'c')
... 
99

cmaloney avatar Apr 06 '25 17:04 cmaloney

Some back pieces for reference: __bytes__ was added to bytes() in 3.11 https://github.com/python/cpython/issues/68422 while type hints were being worked on. The buffer protocol was before that in 3.0 (pep-3118).

cmaloney avatar Apr 06 '25 17:04 cmaloney

Another edge case around these, __bytes__() is only used once and must return only a object that inherits from bytes, on which PyBytes_AsString is used that returns the ob_sval inline storage value. If bytes internal storage ob_sval, __buffer__(), and __bytes__() vary then all three do sometimes get returned. I think it would be interesting to normalize to a specific behavior (straw man: always __buffer__() first), but that definitely isn't the case today (And suspect would take a larger proposal / PEP to change?).

>>> class my_bytes(bytes):
...     def __bytes__(self):
...        return b"bytes"
... 
...     def __buffer__(self, flags):
...         return memoryview(b"buffer")
... 
... class distinct_bytes_buffer(bytes):
...     def __bytes__(self):
...         return my_bytes(b"ob_sval")
... 
...     def __buffer__(self, flags):
...         return memoryview(b"distinct_buffer")
... 
... a = distinct_bytes_buffer(b"distinct_ob_sval")
... bytes(a)
... 
b'ob_sval'

cmaloney avatar Apr 06 '25 19:04 cmaloney

Created a branch which matches resolution order of PyObject_Bytes which gives a small performance improvement (~2%, avoids touching reference count) in the common from exact bytes case, keeps most the improvement for bytearray.

branch matching PyObject_Bytes order:

from_bytes_flags: Mean +- std dev: 27.3 ns +- 0.7 ns
bench_convert[bytes]: Mean +- std dev: 47.7 ns +- 0.4 ns
bench_convert[bytearray]: Mean +- std dev: 54.1 ns +- 0.9 ns

So bytearray goes from 64.7 ns +- 0.9 ns (main) to 54.1 ns +- 0.9 ns with change.

@sobolevn's example now returns the same value both before and after:

>>> class X(bytes):
...     def __bytes__(self):
...         return b'X'
... 
... int.from_bytes(X(b'a'))
... 
88

Should I incorporate here? (cc: @serhiy-storchaka, @sobolevn, @skirpichev)


full diff from main: https://github.com/python/cpython/compare/main...cmaloney:cpython:exp/bytes_first?collapse=1

diff from PR: https://github.com/cmaloney/cpython/commit/189f219b634f103bcbc64bfb22e81e9983794796

cmaloney avatar Apr 08 '25 07:04 cmaloney

Also docs says: "The argument bytes must either be a bytes-like object or an iterable producing bytes." Something is wrong: either implementation (in the main) or docs.

skirpichev avatar Apr 08 '25 10:04 skirpichev

It may be an iterable producing bytes (not the bytes objects, but integers in the range 0 to 255).

serhiy-storchaka avatar Apr 08 '25 11:04 serhiy-storchaka

It may be an iterable producing bytes (not the bytes objects, but integers in the range 0 to 255).

Yes, this part of the sentence might be at least not clear. But I meant the first part, which has a reference to the glossary term.

skirpichev avatar Apr 08 '25 11:04 skirpichev

Updated to use PyBytes_CheckExact first as that case is common and it speeds up bytes relative to main. Also tested some bigger byte strings, added speedup note around 128, 256, and 512 byte bytearray objects which are~1.2x faster, thanks for the suggestion @picnixz

from_bytes_flags: Mean +- std dev: [main] 28.3 ns +- 1.3 ns -> [exactbytes] 27.3 ns +- 0.3 ns: 1.04x faster
bench_convert[bytearray]: Mean +- std dev: [main] 65.8 ns +- 3.3 ns -> [exactbytes] 53.1 ns +- 5.1 ns: 1.24x faster
bench_convert_big[bytes]: Mean +- std dev: [main] 51.8 ns +- 0.6 ns -> [exactbytes] 50.3 ns +- 0.5 ns: 1.03x faster
bench_convert_big[bytearray]: Mean +- std dev: [main] 65.8 ns +- 3.0 ns -> [exactbytes] 53.5 ns +- 5.3 ns: 1.23x faster

Benchmark hidden because not significant (1): bench_convert[bytes]
Updated benchmark code

import pyperf
import time

def from_bytes_flags(loops):
    range_it = range(loops)

    t0 = time.perf_counter()
    for _ in range_it:
        int.from_bytes(b'\x00\x10', byteorder='big')
        int.from_bytes(b'\x00\x10', byteorder='little')
        int.from_bytes(b'\xfc\x00', byteorder='big', signed=True)
        int.from_bytes(b'\xfc\x00', byteorder='big', signed=False)
        int.from_bytes([255, 0, 0], byteorder='big')
    return time.perf_counter() - t0

sample_bytes = [
    b'',
    b'\x00',
    b'\x01',
    b'\x7f',
    b'\x80',
    b'\xff',
    b'\x01\x00',
    b'\x7f\xff',
    b'\x80\x00',
    b'\xff\xff',
    b'\x01\x00\x00',
]
sample_bytearray = [bytearray(v) for v in sample_bytes]

sample_big = [
    b'\xff' * 128,
    b'\xff' * 256,
    b'\xff' * 512
]
sample_big_ba = [bytearray(v) for v in sample_bytes]

def bench_convert(loops, values):
    range_it = range(loops)

    t0 = time.perf_counter()
    for _ in range_it:
        for val in values:
            int.from_bytes(val)
    return time.perf_counter() - t0


runner = pyperf.Runner()

# Validate base bytes w/ flags doesn't change perf.
runner.bench_time_func('from_bytes_flags', from_bytes_flags, inner_loops=10)
runner.bench_time_func('bench_convert[bytes]', bench_convert, sample_bytes, inner_loops=10)
runner.bench_time_func('bench_convert[bytearray]', bench_convert, sample_bytearray, inner_loops=10)

runner.bench_time_func('bench_convert_big[bytes]', bench_convert, sample_big, inner_loops=10)
runner.bench_time_func('bench_convert_big[bytearray]', bench_convert, sample_big_ba, inner_loops=10)

cmaloney avatar Apr 09 '25 04:04 cmaloney

Anything I can do to help close out this PR?

cmaloney avatar Oct 09 '25 18:10 cmaloney

ping @serhiy-storchaka : Trying to find ways to close out this PR. I'm happy to update this to match the past resolution order or leave as is if prior review stands.

cmaloney avatar Oct 15 '25 07:10 cmaloney

I think that we should fix inconsistencies in one of two ways:

  • Deprecate support of the __bytes__() method (and an iterable of integers) anywhere except the explicit bytes constructor and few other explicit cases (like formatting in b'%b'). This is already very uncommon behavior.
  • Automatically add support of the buffer protocol in classes with the bytes method.

I do not know what is better. We should open a discussion for this.

serhiy-storchaka avatar Dec 31 '25 11:12 serhiy-storchaka

A more general solution would definitely be nice. bytearray (mutable block of bytes) + int.from_bytes shows up in the performance of code I've been working on and would be nice to get the performance improvement in a shorter term then building consensus around reworking __bytes__ will likely take.

If I scope this change down to just do a PyByteArray_CheckExact (> 10% perf improvement) would that be possible to land? If/when broader consensus / direction is picked shouldn't block it, but makes it faster currently. I think exact bytearray is fairly safe as 1. bytearray doesn't implement __bytes__(), 2. the pieces are Python internal and buffer + direct access should match currently. Would that be a workable shorter term improvement?

cmaloney avatar Jan 05 '26 05:01 cmaloney

LGTM. The latest PR is now backward compatible.

>>> class X(bytes):
...     def __bytes__(self):
...         return b'X'
...         
>>> int.from_bytes(X(b'a'))
97

There's still a change in ordering between __bytes__ and __buffer__ currently, which I can remove with a small change:

On 3.14.2:

Python 3.14.2 (main, Jan  2 2026, 14:27:39) [GCC 15.2.1 20251112] on linux
Type "help", "copyright", "credits" or "license" for more information.
Ctrl click to launch VS Code Native REPL
>>> class X:
...     def __bytes__(self):
...         return b'b' # 98
...     def __buffer__(self, flags):
...         return memoryview(b'a') # 97
... 
>>> int.from_bytes(X())
98

On 3.15 with this change:

>>> class X:
...     def __bytes__(self):
...         return b'b' # 98
...     def __buffer__(self, flags):
...         return memoryview(b'a') # 97
... 
>>> int.from_bytes(X())
97

cmaloney avatar Jan 10 '26 19:01 cmaloney