typeshed icon indicating copy to clipboard operation
typeshed copied to clipboard

`file_digest.file_digest` signature is overly strict

Open ncoghlan opened this issue 1 year ago • 4 comments

Using hashlib.file_digest with a file object obtained via pathlib.Path.open failed to typecheck with a protocol compatibility error:

src/create_bundles.py:223: error: Argument 1 to "file_digest" has incompatible type "FileIO"; expected "_BytesIOLike | _FileDigestFileObj"  [arg-type]
src/create_bundles.py:223: note: Following member(s) of "FileIO" have conflicts:
src/create_bundles.py:223: note:     Expected:
src/create_bundles.py:223: note:         def readinto(self, bytearray, /) -> int
src/create_bundles.py:223: note:     Got:
src/create_bundles.py:223: note:         def readinto(self, Buffer, /) -> int | None

I'm reasonably sure this is a typeshed issue rather than a mypy issue.

My first thought was that this looks like an internal inconsistency in typeshed, as _FileDigestFileObj is declared specifically with bytearray rather than the more general WriteableBuffer (hence filing the issue here).

However, my second thought was that this signature declaration is presumably as it is because file_digest will specifically pass a bytearray instance to readinto, and bytearray is compatible with Buffer, so mypy shouldn't be complaining about that. That means the conflict is presumably on the return type rather than on parameter types.

The declared type signature nominally indicates that file_digest can't cope with readinto returning None, and mypy is correctly flagging that as inconsistent with the way typeshed declares the file and socket IO types (allowing them to return None here).

Searching the typeshed code suggests this particular inconsistency exists across several different IO consumer type declarations: https://github.com/search?q=repo%3Apython%2Ftypeshed%20readinto&type=code

ncoghlan avatar Jul 23 '24 08:07 ncoghlan

I/O types are a mess, which is why we're moving more into tight protocols for arguments as used in hashlib.

I think your analysis is correct. In fact, file_digest can't handle None return values:

https://github.com/python/cpython/blob/2a5d1eb7073179a13159bce937afdbe240432e7d/Lib/hashlib.py#L232-L236

~~On the other hand it seems that FileIO.readinto can't return None:~~

https://github.com/python/cpython/blob/2a5d1eb7073179a13159bce937afdbe240432e7d/Modules/_io/fileio.c#L657-L684

~~At the moment, FileIO inherits readinto from RawIOBase. I think we just need to override it in FileIO with a more precise return type.~~

srittau avatar Jul 23 '24 09:07 srittau

On the other hand it seems that FileIO.readinto can't return None:

Doesn't it return None in this code path here? https://github.com/python/cpython/blob/2a5d1eb7073179a13159bce937afdbe240432e7d/Modules/_io/fileio.c#L678

AlexWaygood avatar Jul 23 '24 13:07 AlexWaygood

I blame my the heat and my tiredness for overlooking this quite obvious code path ... That said, I'd be interested in which situation FileIO works with non-blocking I/O. One case I can see is here:

https://github.com/python/cpython/blob/2c1b1e7a07eba0138b9858c6f2bea3cae9af0808/Python/fileutils.c#L1887

Another is probably if an opened file descriptor has the non-blocking flag set.

I see two possible solutions: Make FileIO.readinto() return int | MaybeNone as the situations where this happens are a bit esoteric. That said, I actually think that this uncovered a real bug in file_digest() as that doesn't handle the None case correctly when it should.

srittau avatar Jul 23 '24 14:07 srittau

In fact, file_digest is explicitly documented as taking a SocketIO object. And that can very much return None:

https://github.com/python/cpython/blob/2a5d1eb7073179a13159bce937afdbe240432e7d/Lib/socket.py#L713

srittau avatar Jul 23 '24 14:07 srittau