importlib_metadata icon indicating copy to clipboard operation
importlib_metadata copied to clipboard

PackageMetadata.__getitem__ returns None

Open saaketp opened this issue 2 years ago • 12 comments

  • The first problem is almost certainly a bug, the annotation for __getitem__ says it returns a str but it returns None if the metadata doesn't exist. https://github.com/python/importlib_metadata/blob/v4.11.2/importlib_metadata/_meta.py#L15

    For example:

    from importlib.metadata import metadata
    print(metadata("pip")["asdfg"] is None)  # True
    

(second problem moved to #374).

saaketp avatar Mar 08 '22 22:03 saaketp

Hi @saaketp. Thanks for the report. I do believe this issue is a bug. I'm yet unsure if the proper fix is to update the protocol to match the implementation or to update the implementation to match the protocol. Historically, the metadata handling relied primarily on the interface of email.message.EmailMessage. That documentation indicates that None is returned for missing values, so maybe the protocol should be updated to match.

jaraco avatar Mar 12 '22 20:03 jaraco

Honestly, I'm conflicted. My instinct is that the current protocol definition is right and that __getitem__ should raise a KeyError if a key is missing. I'm unsure why EmailMessage deviated from this typical dict behavior. @warsaw, do you know why EmailMessage does not raise a KeyError for missing headers? Do you have reason to think that importlib.metadata should continue that tradition or should align instead with dict and other typical behavior for __getitem__?

jaraco avatar Mar 12 '22 20:03 jaraco

Separately, I'm disappointed that mypy was no help here. As much grief as mypy gives, nitpicking and requiring lots of redundant type hints, it's simultaneously failed here to identify the reported issue, despite the fact that the return type is declared but the returned object is a subclass of a class that violates that declaration. It's conceivable I've missed an important detail that would have allowed mypy to do this validation here.

Perhaps the issue is that typeshed underdefines the return type as Any. Perhaps someone would like to fix that.

jaraco avatar Mar 12 '22 20:03 jaraco

Looking at the issue that resulted in the change of the return type of email.message.EmailMessage.__getitem__ to Any, looks like it can contain lot more types that just str and Header (https://github.com/python/typeshed/issues/2863) and they chose to return Any instead of object for convenience, but it bites us here.

My vote would be to change the implementation to match the Protocol, I don't like __getitem__ returning None. Luckily we have a subclass of EmailMessage already, so we can just override the __getitem__ to do an additional check and throw KeyError if the key doesn't exist.

saaketp avatar Mar 12 '22 22:03 saaketp

I've just filed ~~bpo-47060~~ python/cpython#91216 (originally https://github.com/python/typeshed/issues/7513) which I think might be dancing around the same root cause. Apologies for intruding; I just wanted to provide a cross-reference to BPO for completeness.

DMRobertson avatar Mar 18 '22 18:03 DMRobertson

Spurred by considerations in that bpo issue, I'm inclined to say the implementation should be limited to raise a KeyError when an item is not defined in the metadata. I would like to confirm that this library itself does not rely heavily on that expectation.

This backward-incompatible change will likely cause some disruption, so an incremental approach is probably needed to deprecate the existing behavior (warn when None result is returned for a missing key).

jaraco avatar Mar 19 '22 19:03 jaraco

In 614a522, I confirmed that at least internally, there's no issue with raising KeyError on missing keys.

jaraco avatar Mar 19 '22 19:03 jaraco

Since this change is likely to create disruption, I'd like to reserve it for a new backward-incompatible release to only go into new versions of CPython (3.12 or later), but since it's a refinement of behavior toward the specified behavior, I don't think it needs a deprecation period in CPython's implementation.

jaraco avatar May 21 '22 16:05 jaraco

Before the release could even hit PyPI, the backward incompatibility struck. The release process failed because twine is relying on __getitem__ returning None.

This finding leads me to believe there may need to be a deprecation period for this change.

jaraco avatar Dec 18 '22 16:12 jaraco

What's the proper way to write code that can handle a missing item? If you wrap the getitem in a try/except, you still get a deprecation warning.

cc @hroncok

encukou avatar Oct 03 '23 10:10 encukou

I would also like some advice on how code should be structured for cases where one needs to query for metadata that may or may not be there, without triggering the warning.

int19h avatar Feb 23 '24 19:02 int19h

What's the proper way to write code that can handle a missing item? If you wrap the getitem in a try/except, you still get a deprecation warning.

How about .get(<key>, None)?

Here's a helper you could use:

def future_getitem(metadata: PackageMetadata, key: str) -> str:
    """retrieve a key from metadata similar to a future implementation of PackageMetadata.__getitem__"""
    val = metadata.get(key, None)
    if val is None:
        raise KeyError(key)
    return val

Should importlib_metadata provide this helper?

jaraco avatar Mar 20 '24 08:03 jaraco

Finally, importlib_metadata 8 provides the desired behavior, raising KeyError on a missing key. If you want the KeyError behavior, depend on and use importlib_metadata 8 or later (to be incorporated in Python 3.14).

jaraco avatar Jun 23 '24 21:06 jaraco