typing
typing copied to clipboard
Buffer protocol types
We had several typeshed issues and pull requests lately that try to work around the fact that there is no way to express that a method receives any object following the buffer protocol. The typing documentation mentions ~BytesType~ ByteString, which is an alias for Union[bytes, memoryview, bytearray] (and that bytes can be used as an alias in argument types), but this is missing other types such as array.array or user-defined objects. As this is a C API protocol, just defining such a protocol in typeshed in not possible.
Since the buffer protocol is a standard Python feature and we need to be able to use it in type annotations, it makes sense to me to add typing.Buffer or similar. Perhaps we can make concrete buffer classes inherit from Buffer, so we'd do something like
# typing.pyi
class Buffer: ... # empty at the Python level
# builtins.pyi
class bytes(Buffer, Sequence[int]): ...
We'd have to add Buffer also to typing_extensions.
(Side note: I think you mean ByteString (https://docs.python.org/3/library/typing.html#typing.ByteString), not BytesType.)
I like what @JelleZijlstra proposes (obviously the type should be abstract).
Actually, it's a bit more complicated, since some buffers are writable and others aren't (see the types in python/typeshed#2610). This is controlled by whether the type responds to requests with PyBuf_WRITABLE (https://docs.python.org/3/c-api/buffer.html#c.PyBUF_WRITABLE) set. So here's a revised proposal:
# typing.pyi
class ReadableBuffer: ... # abstract, no Python attributes; corresponds to C types that expose buffers without PyBUF_WRITABLE set
class WriteableBuffer(ReadableBuffer): ... # same; corresponds to C types that expose buffers with PyBUF_WRITABLE set
# builtins.pyi
class bytes(ReadableBuffer, Sequence[int]): ...
class bytearray(WritableBuffer, Sequence[int]): ...
There are a number of other flags controlling format, dimensions, etc., but I'm not sure those could be easily expressed in the type system. Perhaps we could implement format flag by making Buffer generic over a typevar that is restricted to certain types, but Python types don't map cleanly to C types, so that doesn't seem like it would work well.
python/typeshed#2895 is one example where this could be useful.
~python/typeshed#2895 is one example where this could be useful.~
~It is fine to experiment with such things defined locally (with an underscore, like _Reader), we can put something in typing later, when we will have more experience.~
Oh sorry, this is a wrong issue, disregard my last comment.
Is this something that could be considered? What steps are necessary to continue?
@srittau If it is not too hard maybe you can directly make a PoC PR to typeshed, so that we can discuss the details (IIUC you want this to be a stub-only feature).
cc @gvanrossum
I believe there is an open python issue about this: https://bugs.python.org/issue27501
Until a proper type for the buffer protocol is available, would it make sense to at least partially fix this (in places like zlib) with "better-than-just-bytes" coverage workarounds? For example:
Union[bytes, bytearray, memoryview]
It seems like that would cover the vast majority of use cases.
An example of something missing from that type definition that works, at least, for zlib.compress is an array.array of bytes. I can't seem to figure out how to force to be an array of bytes from a typing perspective, though.
Also - typing.Bytestring (mentioned in the original post) doesn't seem appropriate in all cases since it looks like it is Sequence[int] in there. A sequence of ints is not accepted by zlib.decompress, for example, although my dusty memory uncertainly thinks that a sequence of ints was supposed to be legit for a true buffer protocol (I'm really not sure, though).
As mentioned in #997 it would also be useful to be able to specify length for any buffer types, in particular where a fixed length string is expected.
Bump :)
@ilevkivskyi @srittau What's the process to getting this accepted? Does this require a new PEP? I'd be open to working on this, but I'm not sure where to start.
A related question is how this would be handled in Python given the move to builtins for type hints (like with PEP 585)
By the way, I just noticed that @JelleZijlstra's suggestion has been implemented:
https://github.com/python/typeshed/blob/494481a0aed2ef0e00bbe190476ace0b8261bce6/stdlib/_typeshed/init.pyi#L185-L191
I suppose that means those should be moved here in order to consider this issue resolved?
I think it doesn't necessarily require a PEP: we could just add the types to typing.pyi and typing_extensions.pyi (as I suggested in https://github.com/python/typing/issues/593#issuecomment-440327001 a long time ago). The process could be similar to what we just did with reveal_type(): a typing-sig discussion, followed by direct implementation in CPython.
By the way, I just noticed that @JelleZijlstra's suggestion has been implemented:
https://github.com/python/typeshed/blob/494481a0aed2ef0e00bbe190476ace0b8261bce6/stdlib/_typeshed/init.pyi#L185-L191
I suppose that means those should be moved here in order to consider this issue resolved?
This doesn't seem quite right either as memoryview and mmap are being treated as writeable. However they may or may not be. For example a memoryview of a bytes object or an mmap of a read-only file are not writeable
I am preparing a PEP to support checking the buffer protocol not only in the type system, but also at runtime. A first draft is at https://github.com/JelleZijlstra/peps/blob/bufferpep/pep-9999.rst. Any early feedback is welcome.
@JelleZijlstra LGTM so far, although it's unfortunate that it's difficult to distinguish between readable and read/writable buffers, but it makes sense.
although it's unfortunate that it's difficult to distinguish between readable and read/writable buffers, but it makes sense.
Readonly is only one of a number of attributes that are important for determining whether a buffer can be used. Some libraries can only deal with contiguous buffers, or native endianness, or aligned data. It looks to me like the PEP does the right thing here - best to support either all attributes or none, but not make readonly more important than other attributes.
This is now PEP 688: https://peps.python.org/pep-0688/.
Fixed by PEP-688.