importlib_metadata
importlib_metadata copied to clipboard
PackageMetadata.__getitem__ returns None
-
The first problem is almost certainly a bug, the annotation for
__getitem__
says it returns astr
but it returnsNone
if the metadata doesn't exist. https://github.com/python/importlib_metadata/blob/v4.11.2/importlib_metadata/_meta.py#L15For example:
from importlib.metadata import metadata print(metadata("pip")["asdfg"] is None) # True
(second problem moved to #374).
Hi @saaketp. Thanks for the report. I do believe this issue is a bug. I'm yet unsure if the proper fix is to update the protocol to match the implementation or to update the implementation to match the protocol. Historically, the metadata handling relied primarily on the interface of email.message.EmailMessage. That documentation indicates that None
is returned for missing values, so maybe the protocol should be updated to match.
Honestly, I'm conflicted. My instinct is that the current protocol definition is right and that __getitem__
should raise a KeyError
if a key is missing. I'm unsure why EmailMessage
deviated from this typical dict
behavior. @warsaw, do you know why EmailMessage
does not raise a KeyError
for missing headers? Do you have reason to think that importlib.metadata
should continue that tradition or should align instead with dict and other typical behavior for __getitem__
?
Separately, I'm disappointed that mypy
was no help here. As much grief as mypy gives, nitpicking and requiring lots of redundant type hints, it's simultaneously failed here to identify the reported issue, despite the fact that the return type is declared but the returned object is a subclass of a class that violates that declaration. It's conceivable I've missed an important detail that would have allowed mypy to do this validation here.
Perhaps the issue is that typeshed underdefines the return type as Any. Perhaps someone would like to fix that.
Looking at the issue that resulted in the change of the return type of email.message.EmailMessage.__getitem__
to Any, looks like it can contain lot more types that just str and Header (https://github.com/python/typeshed/issues/2863) and they chose to return Any
instead of object
for convenience, but it bites us here.
My vote would be to change the implementation to match the Protocol, I don't like __getitem__
returning None
.
Luckily we have a subclass of EmailMessage
already, so we can just override the __getitem__
to do an additional check and throw KeyError
if the key doesn't exist.
I've just filed ~~bpo-47060~~ python/cpython#91216 (originally https://github.com/python/typeshed/issues/7513) which I think might be dancing around the same root cause. Apologies for intruding; I just wanted to provide a cross-reference to BPO for completeness.
Spurred by considerations in that bpo issue, I'm inclined to say the implementation should be limited to raise a KeyError when an item is not defined in the metadata. I would like to confirm that this library itself does not rely heavily on that expectation.
This backward-incompatible change will likely cause some disruption, so an incremental approach is probably needed to deprecate the existing behavior (warn when None
result is returned for a missing key).
In 614a522, I confirmed that at least internally, there's no issue with raising KeyError on missing keys.
Since this change is likely to create disruption, I'd like to reserve it for a new backward-incompatible release to only go into new versions of CPython (3.12 or later), but since it's a refinement of behavior toward the specified behavior, I don't think it needs a deprecation period in CPython's implementation.
Before the release could even hit PyPI, the backward incompatibility struck. The release process failed because twine
is relying on __getitem__
returning None
.
This finding leads me to believe there may need to be a deprecation period for this change.
What's the proper way to write code that can handle a missing item?
If you wrap the getitem in a try
/except
, you still get a deprecation warning.
cc @hroncok
I would also like some advice on how code should be structured for cases where one needs to query for metadata that may or may not be there, without triggering the warning.
What's the proper way to write code that can handle a missing item? If you wrap the getitem in a
try
/except
, you still get a deprecation warning.
How about .get(<key>, None)
?
Here's a helper you could use:
def future_getitem(metadata: PackageMetadata, key: str) -> str:
"""retrieve a key from metadata similar to a future implementation of PackageMetadata.__getitem__"""
val = metadata.get(key, None)
if val is None:
raise KeyError(key)
return val
Should importlib_metadata provide this helper?
Finally, importlib_metadata 8 provides the desired behavior, raising KeyError on a missing key. If you want the KeyError behavior, depend on and use importlib_metadata 8 or later (to be incorporated in Python 3.14).