gh-132108: Add Buffer Protocol support to int.from_bytes to improve performance
Speed up conversion from bytes-like objects like bytearray while keeping conversion from bytes stable.
On a --with-lto --enable-optimizaitons build on my 64 bit Linux box:
new:
from_bytes_flags: Mean +- std dev: 28.6 ns +- 0.5 ns
bench_convert[bytes]: Mean +- std dev: 50.4 ns +- 1.4 ns
bench_convert[bytearray]: Mean +- std dev: 51.3 ns +- 0.7 ns
old:
from_bytes_flags: Mean +- std dev: 28.1 ns +- 1.1 ns
bench_convert[bytes]: Mean +- std dev: 50.3 ns +- 4.3 ns
bench_convert[bytearray]: Mean +- std dev: 64.7 ns +- 0.9 ns
Benchmark code:
import pyperf
import time
def from_bytes_flags(loops):
range_it = range(loops)
t0 = time.perf_counter()
for _ in range_it:
int.from_bytes(b'\x00\x10', byteorder='big')
int.from_bytes(b'\x00\x10', byteorder='little')
int.from_bytes(b'\xfc\x00', byteorder='big', signed=True)
int.from_bytes(b'\xfc\x00', byteorder='big', signed=False)
int.from_bytes([255, 0, 0], byteorder='big')
return time.perf_counter() - t0
sample_bytes = [
b'',
b'\x00',
b'\x01',
b'\x7f',
b'\x80',
b'\xff',
b'\x01\x00',
b'\x7f\xff',
b'\x80\x00',
b'\xff\xff',
b'\x01\x00\x00',
]
sample_bytearray = [bytearray(v) for v in sample_bytes]
def bench_convert(loops, values):
range_it = range(loops)
t0 = time.perf_counter()
for _ in range_it:
for val in values:
int.from_bytes(val)
return time.perf_counter() - t0
runner = pyperf.Runner()
runner.bench_time_func('from_bytes_flags', from_bytes_flags, inner_loops=10)
runner.bench_time_func('bench_convert[bytes]', bench_convert, sample_bytes, inner_loops=10)
runner.bench_time_func('bench_convert[bytearray]', bench_convert, sample_bytearray, inner_loops=10)
- Issue: gh-132108
Small question but how do we cope with classes that explicitly define .__bytes__() and are buffer-like? like custom bytes objects? (this is an edge-case but still, it can be a breaking change).
Note that PyObject_Bytes first call __bytes__, then call PyBytes_FromObject if there is no __bytes__ and only then are buffer-like objects considered, but not before. So __bytes__ has a higher priority than buffer-like interface.
Instead, we should restrict ourselves to exact buffer objects, namely exact bytes and bytearray objects.
Cases including classes which implement __bytes__() that return both valid (ex. bytes) and non-valid (ex. str) values are tested in test_long, test_from_bytes so I don't think any critical behavior changes there.
As you point out, if code returns a different set of machine bytes when exporting buffer protocol vs __bytes__(), this will change behavior. __bytes__() will not be run, instead just the buffer export will be called. That same issue will come up in PyObject_Bytes vs. PyBytes_FromObject calls as PyObject_Bytes checks __bytes__() first while PyBytes_FromObject does buffer protocol first and never checks __bytes__(). Code here uses PyObject_Bytes(). I don't think CPython strongly uses one or the other as "more correct".
Could match existing behavior by always checking for a __bytes__ member and !PyBytes_CheckExact() (avoid __bytes__() call for bytes as it changes performance and wasn't present before). To me that isn't as good of an implementation. It is slower (more branches), more complex code, and I prefer encouraging buffer protocol for best performance.
Could restrict to known CPython types (bytes, bytearray, array, memoryview), but that lowers the usefulness to me as systems which implement buffer and __bytes__ for efficiency can't use the newer and potentially more efficient buffer protocol here. It also requires more condition / type checks than PyObject_CheckBuffer.
Walking through common types passed to int.from_bytes() more explicitly:
- exact
bytes, the new code will get the data using aPy_bufferrather than increment the ref to thebytes(PyBytes_CheckExactcase). Perf test shows performance is stable for that. - "bytes-like" objects (subclasses of
bytes,bytearray,memoryview,array) used the buffer protocol to copy before, use now. Less calls/branches/checks getting to exporting the buffer. Removes a copy of that buffer into aPyBytes. Perf test shows faster forbytearray, likely is for other cases as well. -
list,tuple, iterable (other thanstr):PyObject_CheckBufferwill fail for so code will callPyObject_Byteswhich will callPyBytes_FromObjectto handle, same as before. -
str: Doesn't export bytes. That fails / raises aTypeErrorIntest_long,test_from_bytesvalidates that behavior. Behavior is unchanged. - Objects that implement
__bytes__()but don't support buffer protocol: Tested intest_longtest_from_bytes(ValidBytes, InvalidBytes, RaisingBytes). These behave as before.PyObject_CheckBufferwill fail for so code will callPyObject_Byteswhich will callPyBytes_FromObjectto handle, same as before. - Objects that implement
__bytes__()and support buffer protocol: The__bytes__()function will no longer be called; If it broke the API contract by returningstrfor instance code will now run using its buffer protocol to get the underlying machine bytes instead of throwing an exception.
This is a breaking change. Example.
Docs says:
Called by bytes to compute a byte-string representation of an object. This should return a bytes object. The object class itself does not provide this method.
If b'X' is a byte-string representation of b'a' - you are, probably, correct. Otherwise it's just an example, that you could break something, by overriding dunder methods in subclasses. Say,
>>> class int2(int):
... def __float__(self):
... return 3.14
...
>>> float(int2(123))
3.14
This is an example that the method resolution order changes. It now ignores custom __bytes__ method, I don't think that changing __bytes__ method on bytes subclass is an artificial example. I am pretty sure that people use that in the wild.
The reverse logic is true: PR's author must prove that it does not break things.
It now ignores custom
__bytes__method
I would say that if you expose something different via buffer protocol and __bytes__ dunder - it's your fault, isn't? (Though, this constraint isn't documented explicitly.) Just as we, probably, could assume that float(int2(123)) = float(int(int2(123))).
I'll see if I can make a largely performance neutral version that checks __bytes__ before using buffer protocol. The potential disconnect between __bytes__() and __buffer__() concerns me, feels like a source of easy to code hard to detect until they show up somewhere that's a problem bugs... Wondering if there's an efficient way to say something like "If __bytes__() is set, __buffer__() should be cleared (or defaulted to memoryview(__bytes__())?
>>> class distinct_bytes_buffer(bytes):
... def __bytes__(self):
... return b'b'
...
... def __buffer__(self, flags):
... return memoryview(b'c')
...
...
... class same_bytes_buffer(bytes):
... def __bytes__(self):
... return b'b'
...
... def __buffer__(self, flags):
... return memoryview(b'b')
...
>>> int.from_bytes(distinct_bytes_buffer(b'a'))
...
99
>>> int.from_bytes(same_bytes_buffer(b'a'))
...
98
>>> int.from_bytes(b'a')
...
97
>>> int.from_bytes(b'b')
...
98
>>> int.from_bytes(b'c')
...
99
Some back pieces for reference: __bytes__ was added to bytes() in 3.11 https://github.com/python/cpython/issues/68422 while type hints were being worked on. The buffer protocol was before that in 3.0 (pep-3118).
Another edge case around these, __bytes__() is only used once and must return only a object that inherits from bytes, on which PyBytes_AsString is used that returns the ob_sval inline storage value. If bytes internal storage ob_sval, __buffer__(), and __bytes__() vary then all three do sometimes get returned. I think it would be interesting to normalize to a specific behavior (straw man: always __buffer__() first), but that definitely isn't the case today (And suspect would take a larger proposal / PEP to change?).
>>> class my_bytes(bytes):
... def __bytes__(self):
... return b"bytes"
...
... def __buffer__(self, flags):
... return memoryview(b"buffer")
...
... class distinct_bytes_buffer(bytes):
... def __bytes__(self):
... return my_bytes(b"ob_sval")
...
... def __buffer__(self, flags):
... return memoryview(b"distinct_buffer")
...
... a = distinct_bytes_buffer(b"distinct_ob_sval")
... bytes(a)
...
b'ob_sval'
Created a branch which matches resolution order of PyObject_Bytes which gives a small performance improvement (~2%, avoids touching reference count) in the common from exact bytes case, keeps most the improvement for bytearray.
branch matching PyObject_Bytes order:
from_bytes_flags: Mean +- std dev: 27.3 ns +- 0.7 ns
bench_convert[bytes]: Mean +- std dev: 47.7 ns +- 0.4 ns
bench_convert[bytearray]: Mean +- std dev: 54.1 ns +- 0.9 ns
So bytearray goes from 64.7 ns +- 0.9 ns (main) to 54.1 ns +- 0.9 ns with change.
@sobolevn's example now returns the same value both before and after:
>>> class X(bytes):
... def __bytes__(self):
... return b'X'
...
... int.from_bytes(X(b'a'))
...
88
Should I incorporate here? (cc: @serhiy-storchaka, @sobolevn, @skirpichev)
full diff from main: https://github.com/python/cpython/compare/main...cmaloney:cpython:exp/bytes_first?collapse=1
diff from PR: https://github.com/cmaloney/cpython/commit/189f219b634f103bcbc64bfb22e81e9983794796
Also docs says: "The argument bytes must either be a bytes-like object or an iterable producing bytes." Something is wrong: either implementation (in the main) or docs.
It may be an iterable producing bytes (not the bytes objects, but integers in the range 0 to 255).
It may be an iterable producing bytes (not the bytes objects, but integers in the range 0 to 255).
Yes, this part of the sentence might be at least not clear. But I meant the first part, which has a reference to the glossary term.
Updated to use PyBytes_CheckExact first as that case is common and it speeds up bytes relative to main. Also tested some bigger byte strings, added speedup note around 128, 256, and 512 byte bytearray objects which are~1.2x faster, thanks for the suggestion @picnixz
from_bytes_flags: Mean +- std dev: [main] 28.3 ns +- 1.3 ns -> [exactbytes] 27.3 ns +- 0.3 ns: 1.04x faster
bench_convert[bytearray]: Mean +- std dev: [main] 65.8 ns +- 3.3 ns -> [exactbytes] 53.1 ns +- 5.1 ns: 1.24x faster
bench_convert_big[bytes]: Mean +- std dev: [main] 51.8 ns +- 0.6 ns -> [exactbytes] 50.3 ns +- 0.5 ns: 1.03x faster
bench_convert_big[bytearray]: Mean +- std dev: [main] 65.8 ns +- 3.0 ns -> [exactbytes] 53.5 ns +- 5.3 ns: 1.23x faster
Benchmark hidden because not significant (1): bench_convert[bytes]
Updated benchmark code
import pyperf
import time
def from_bytes_flags(loops):
range_it = range(loops)
t0 = time.perf_counter()
for _ in range_it:
int.from_bytes(b'\x00\x10', byteorder='big')
int.from_bytes(b'\x00\x10', byteorder='little')
int.from_bytes(b'\xfc\x00', byteorder='big', signed=True)
int.from_bytes(b'\xfc\x00', byteorder='big', signed=False)
int.from_bytes([255, 0, 0], byteorder='big')
return time.perf_counter() - t0
sample_bytes = [
b'',
b'\x00',
b'\x01',
b'\x7f',
b'\x80',
b'\xff',
b'\x01\x00',
b'\x7f\xff',
b'\x80\x00',
b'\xff\xff',
b'\x01\x00\x00',
]
sample_bytearray = [bytearray(v) for v in sample_bytes]
sample_big = [
b'\xff' * 128,
b'\xff' * 256,
b'\xff' * 512
]
sample_big_ba = [bytearray(v) for v in sample_bytes]
def bench_convert(loops, values):
range_it = range(loops)
t0 = time.perf_counter()
for _ in range_it:
for val in values:
int.from_bytes(val)
return time.perf_counter() - t0
runner = pyperf.Runner()
# Validate base bytes w/ flags doesn't change perf.
runner.bench_time_func('from_bytes_flags', from_bytes_flags, inner_loops=10)
runner.bench_time_func('bench_convert[bytes]', bench_convert, sample_bytes, inner_loops=10)
runner.bench_time_func('bench_convert[bytearray]', bench_convert, sample_bytearray, inner_loops=10)
runner.bench_time_func('bench_convert_big[bytes]', bench_convert, sample_big, inner_loops=10)
runner.bench_time_func('bench_convert_big[bytearray]', bench_convert, sample_big_ba, inner_loops=10)
Anything I can do to help close out this PR?
ping @serhiy-storchaka : Trying to find ways to close out this PR. I'm happy to update this to match the past resolution order or leave as is if prior review stands.
I think that we should fix inconsistencies in one of two ways:
- Deprecate support of the
__bytes__()method (and an iterable of integers) anywhere except the explicitbytesconstructor and few other explicit cases (like formatting inb'%b'). This is already very uncommon behavior. - Automatically add support of the buffer protocol in classes with the
bytesmethod.
I do not know what is better. We should open a discussion for this.
A more general solution would definitely be nice. bytearray (mutable block of bytes) + int.from_bytes shows up in the performance of code I've been working on and would be nice to get the performance improvement in a shorter term then building consensus around reworking __bytes__ will likely take.
If I scope this change down to just do a PyByteArray_CheckExact (> 10% perf improvement) would that be possible to land? If/when broader consensus / direction is picked shouldn't block it, but makes it faster currently. I think exact bytearray is fairly safe as 1. bytearray doesn't implement __bytes__(), 2. the pieces are Python internal and buffer + direct access should match currently. Would that be a workable shorter term improvement?
LGTM. The latest PR is now backward compatible.
>>> class X(bytes): ... def __bytes__(self): ... return b'X' ... >>> int.from_bytes(X(b'a')) 97
There's still a change in ordering between __bytes__ and __buffer__ currently, which I can remove with a small change:
On 3.14.2:
Python 3.14.2 (main, Jan 2 2026, 14:27:39) [GCC 15.2.1 20251112] on linux
Type "help", "copyright", "credits" or "license" for more information.
Ctrl click to launch VS Code Native REPL
>>> class X:
... def __bytes__(self):
... return b'b' # 98
... def __buffer__(self, flags):
... return memoryview(b'a') # 97
...
>>> int.from_bytes(X())
98
On 3.15 with this change:
>>> class X:
... def __bytes__(self):
... return b'b' # 98
... def __buffer__(self, flags):
... return memoryview(b'a') # 97
...
>>> int.from_bytes(X())
97