typing icon indicating copy to clipboard operation
typing copied to clipboard

Buffer protocol types

Open srittau opened this issue 7 years ago • 20 comments

We had several typeshed issues and pull requests lately that try to work around the fact that there is no way to express that a method receives any object following the buffer protocol. The typing documentation mentions ~BytesType~ ByteString, which is an alias for Union[bytes, memoryview, bytearray] (and that bytes can be used as an alias in argument types), but this is missing other types such as array.array or user-defined objects. As this is a C API protocol, just defining such a protocol in typeshed in not possible.

srittau avatar Nov 20 '18 07:11 srittau

Since the buffer protocol is a standard Python feature and we need to be able to use it in type annotations, it makes sense to me to add typing.Buffer or similar. Perhaps we can make concrete buffer classes inherit from Buffer, so we'd do something like

# typing.pyi
class Buffer: ...  # empty at the Python level

# builtins.pyi
class bytes(Buffer, Sequence[int]): ...

We'd have to add Buffer also to typing_extensions.

(Side note: I think you mean ByteString (https://docs.python.org/3/library/typing.html#typing.ByteString), not BytesType.)

JelleZijlstra avatar Nov 20 '18 08:11 JelleZijlstra

I like what @JelleZijlstra proposes (obviously the type should be abstract).

ilevkivskyi avatar Nov 20 '18 11:11 ilevkivskyi

Actually, it's a bit more complicated, since some buffers are writable and others aren't (see the types in python/typeshed#2610). This is controlled by whether the type responds to requests with PyBuf_WRITABLE (https://docs.python.org/3/c-api/buffer.html#c.PyBUF_WRITABLE) set. So here's a revised proposal:

# typing.pyi
class ReadableBuffer: ...  # abstract, no Python attributes; corresponds to C types that expose buffers without PyBUF_WRITABLE set
class WriteableBuffer(ReadableBuffer): ...  # same; corresponds to C types that expose buffers with PyBUF_WRITABLE set

# builtins.pyi
class bytes(ReadableBuffer, Sequence[int]): ...
class bytearray(WritableBuffer, Sequence[int]): ...

There are a number of other flags controlling format, dimensions, etc., but I'm not sure those could be easily expressed in the type system. Perhaps we could implement format flag by making Buffer generic over a typevar that is restricted to certain types, but Python types don't map cleanly to C types, so that doesn't seem like it would work well.

JelleZijlstra avatar Nov 20 '18 16:11 JelleZijlstra

python/typeshed#2895 is one example where this could be useful.

srittau avatar Mar 30 '19 17:03 srittau

~python/typeshed#2895 is one example where this could be useful.~

~It is fine to experiment with such things defined locally (with an underscore, like _Reader), we can put something in typing later, when we will have more experience.~

ilevkivskyi avatar Apr 02 '19 16:04 ilevkivskyi

Oh sorry, this is a wrong issue, disregard my last comment.

ilevkivskyi avatar Apr 02 '19 16:04 ilevkivskyi

Is this something that could be considered? What steps are necessary to continue?

srittau avatar Aug 07 '19 11:08 srittau

@srittau If it is not too hard maybe you can directly make a PoC PR to typeshed, so that we can discuss the details (IIUC you want this to be a stub-only feature).

cc @gvanrossum

ilevkivskyi avatar Aug 07 '19 12:08 ilevkivskyi

I believe there is an open python issue about this: https://bugs.python.org/issue27501

christopher-hesse avatar Nov 26 '19 17:11 christopher-hesse

Until a proper type for the buffer protocol is available, would it make sense to at least partially fix this (in places like zlib) with "better-than-just-bytes" coverage workarounds? For example:

Union[bytes, bytearray, memoryview]

It seems like that would cover the vast majority of use cases.

An example of something missing from that type definition that works, at least, for zlib.compress is an array.array of bytes. I can't seem to figure out how to force to be an array of bytes from a typing perspective, though.

Also - typing.Bytestring (mentioned in the original post) doesn't seem appropriate in all cases since it looks like it is Sequence[int] in there. A sequence of ints is not accepted by zlib.decompress, for example, although my dusty memory uncertainly thinks that a sequence of ints was supposed to be legit for a true buffer protocol (I'm really not sure, though).

rwarren avatar Mar 28 '20 20:03 rwarren

As mentioned in #997 it would also be useful to be able to specify length for any buffer types, in particular where a fixed length string is expected.

covert-encryption avatar Jan 02 '22 11:01 covert-encryption

Bump :)

@ilevkivskyi @srittau What's the process to getting this accepted? Does this require a new PEP? I'd be open to working on this, but I'm not sure where to start.

itaisteinherz avatar Feb 07 '22 19:02 itaisteinherz

A related question is how this would be handled in Python given the move to builtins for type hints (like with PEP 585)

jakirkham avatar Feb 07 '22 20:02 jakirkham

By the way, I just noticed that @JelleZijlstra's suggestion has been implemented:

https://github.com/python/typeshed/blob/494481a0aed2ef0e00bbe190476ace0b8261bce6/stdlib/_typeshed/init.pyi#L185-L191

I suppose that means those should be moved here in order to consider this issue resolved?

itaisteinherz avatar Feb 07 '22 20:02 itaisteinherz

I think it doesn't necessarily require a PEP: we could just add the types to typing.pyi and typing_extensions.pyi (as I suggested in https://github.com/python/typing/issues/593#issuecomment-440327001 a long time ago). The process could be similar to what we just did with reveal_type(): a typing-sig discussion, followed by direct implementation in CPython.

JelleZijlstra avatar Feb 07 '22 20:02 JelleZijlstra

By the way, I just noticed that @JelleZijlstra's suggestion has been implemented:

https://github.com/python/typeshed/blob/494481a0aed2ef0e00bbe190476ace0b8261bce6/stdlib/_typeshed/init.pyi#L185-L191

I suppose that means those should be moved here in order to consider this issue resolved?

This doesn't seem quite right either as memoryview and mmap are being treated as writeable. However they may or may not be. For example a memoryview of a bytes object or an mmap of a read-only file are not writeable

jakirkham avatar Feb 23 '22 02:02 jakirkham

I am preparing a PEP to support checking the buffer protocol not only in the type system, but also at runtime. A first draft is at https://github.com/JelleZijlstra/peps/blob/bufferpep/pep-9999.rst. Any early feedback is welcome.

JelleZijlstra avatar Apr 22 '22 03:04 JelleZijlstra

@JelleZijlstra LGTM so far, although it's unfortunate that it's difficult to distinguish between readable and read/writable buffers, but it makes sense.

srittau avatar Apr 22 '22 07:04 srittau

although it's unfortunate that it's difficult to distinguish between readable and read/writable buffers, but it makes sense.

Readonly is only one of a number of attributes that are important for determining whether a buffer can be used. Some libraries can only deal with contiguous buffers, or native endianness, or aligned data. It looks to me like the PEP does the right thing here - best to support either all attributes or none, but not make readonly more important than other attributes.

rgommers avatar Apr 25 '22 14:04 rgommers

This is now PEP 688: https://peps.python.org/pep-0688/.

JelleZijlstra avatar Apr 25 '22 14:04 JelleZijlstra

Fixed by PEP-688.

JelleZijlstra avatar May 22 '23 00:05 JelleZijlstra