packaging
packaging copied to clipboard
Support a Custom Validation Policy for Metadata
I'm working on porting Warehouse over to using packaging.metadata
to drive our validation, and as I'm teasing apart the existing code it's become clear to me that our metadata validation in Warehouse falls into two categories:
- Validation that is required by the core metadata spec for the metadata to be valid at all.
- Validation that Warehouse adds on top of the core metadata spec for various application specific policies.
I'm currently implementing this in Warehouse by basically doing this:
from packaging.metadata import Metadata, InvalidMetadata
def parse_metadata(data: bytes) -> Metadata:
metadata = Metadata.from_email(data)
errors: list[InvalidMetadata] = []
# Additional checks
if errors:
raise ExceptionGroup("invalid metadata", errors)
return metadata
This of course works, but it has a few negatives:
- The metadata validation natively uses
ExceptionGroup
to collect all of the validation errors and raising them all at once, however there's not a clear and easy way to do that with our additional validation, so we don't see the extra validation errors if any "built in" validation errors happen. - We went to great lengths to support on demand validation of fields, allowing applications to have partial validation of only the data that they personally care about. However, implementing on demand "extra" validation would require either a wrapper around
Metadata
that dispatches to the underlyingMetadata
, fetches the value, then adds the extra validation.
An idea that popped into my mind is allowing passing in a "validation policy" of some sort, that would be used to inform the Metadata
class of these additional validation checks, and have them handled inline with all of the other validation.
A policy could look something like:
import readme
class MetadataValidationPolicy:
# TODO: These returns an InvalidMetadata, mostly to keep it similar to full_validate, however
# it's probably more ergonomic/expected to raise it? Maybe even full_vallidate should
# raise and when the metadata class calls it, it supports either raising an InvalidMetadata
# or a ExceptionGroup containing InvalidMetadata.
def validate_metadata_version(self, metadata_version: str) -> InvalidMetadata | None:
# PyPI currently only supports *some* Metadata versions, but not all of them, and even if
# we did support all current ones, new ones require integration work so we don't want to
# start accepting them implicitly.
if metadata_version not in {"1.0", "1.1", "1.2", "2.0", "2.1"}:
return InvalidMetadata("metadata-version", f"{metadata_version!r} is an unsupported metadata version")
def validate_version(self, version: Version) -> InvalidMetadata | None:
if version.local:
return InvalidMetadata("version", f"The use of local versions in {version!r} is not allowed.")
def full_validate(self, metadata: Metadata) -> list[InvalidMetadata]:
errors: list[InvalidMetadata] = []
# This is sort of a bad example, because we wouldn't put this in here because we wouldn't
# want to throw away the rendered value, but just as an example.
if metadata.description:
description_content_type = metadata.description_content_type or "text/x-rst"
rendered = readme.render(metadata.description, description_content_type, use_fallback=False)
if rendered is None:
errors.append(InvalidMetadata("description", f"Failed to render the description using {description_content_type!r}"))
return errors
Given a policy, the question then becomes how would we inform the Metadata
class about it.
The easiest mechanism for doing so I think would be as an additional keyword parameter to the Metadata.from_*
methods, so something like:
metadata = Metadata.from_email(b"...", validation_policy=MetadataValidationPolicy())
metadata.metadata_version
We don't currently support creating a Metadata
instance dynamically other than through the from_*
methods, but you could imagine if we did then we could still use a keyword argument to handle that:
metadata = Metadata(..., validation_policy=MetadataValidationPolicy())
metadata.metadata_version
Of course that would mean we can't ever have an core metadata field named validation_policy
that we want to set through the constructor, but I think it's unlikely that we ever do (and of course if we are worried about it there are other tricks like using _validation_policy
or something.
We could even support subclassing Metadata
and setting a subclass wide validation policy, if a project wants that rather than manually passing it in to each invocation, using something like:
from packaging.metadata import Metadata as _Metadata
class Metadata(_Metadata, validation_policy=MetadataValidationPolicy):
pass
Of course, supporting this use case does add a non trivial amount of additional complexity to packaging.metadata
, and since these validations are purely additive it is entirely possible to implement this outside of the metadata library with relatively minor downsides, so I think it would be perfectly reasonable to declare this out of scope. I wanted to raise it as an idea though, as it kind of surfaced during my work of integrating packaging.metadata
into Warehouse.
It's at least out of scope of my free time. 😉 But I don't inherently object to this idea either.