python-mimeparse icon indicating copy to clipboard operation
python-mimeparse copied to clipboard

request: add support for parsing and matching suffixes

Open rafalkrupinski opened this issue 1 year ago • 9 comments

Since the times of XML media often have a suffix, for example application/vnd.oai.openapi+json. This conveys separate information on syntax and semantics.

Ability to match by group and suffix (in the example: application, json) would let applications to communicate the exact media type, and libraries to process such content in a generic way.

rafalkrupinski avatar Aug 23 '24 19:08 rafalkrupinski

matching by (group and (subtype or suffix)) to be exact

rafalkrupinski avatar Aug 23 '24 19:08 rafalkrupinski

Hi,

is your objective to add a function that given the result of parse_mime_type('application/vnd.oai.openapi+json') that would be ('application', 'vnd.oai.openapi+json', {}), can take the second index and split it into ('vnd.oai.openapi', 'json')?

I guess we could add it.

Changing the result of parse_mime_type could also be an option, but it would be in a backward compatible way

CaselIT avatar Aug 27 '24 19:08 CaselIT

I was thinking of:

>>> parse_mime_type('application/vnd.oai.openapi+json', suffix=True)
('application', 'json', {})
>>> parse_mime_type('application/json', suffix=True)
('application', 'json', {})

This way keeps compatibility and best_match wouldn't have to handle two cases (split and non-split), but I'm assuming that no caller would ever want both full type and suffix.

Also, is the part between '/' and '+' useful for anything? I thought you either work with full media type application/vnd.oai.openapi+json or just the serialization media type application/json. That part doesn't even have a name, AFAIK.

rafalkrupinski avatar Aug 27 '24 21:08 rafalkrupinski

If we introduce a new flag we can also return a 4-tuple when it's true, since it's a new feature.

Or have the second item of the 3-tuple be a 2-tuple (or a named tuple) instead of a string. This would work better if the flag is called split_siffix

@vytas7 what your opinion on this?

CaselIT avatar Aug 28 '24 06:08 CaselIT

To be honest, I'm not a big fan of adding this to the current interface, because it is not really in the spirit of RFC 9110, Section 12 (Content Negotiation). The RFC does not really define a way to provide wildcards (*) to only a segment of media type or subtype.

Although I do understand the value proposition in some scenarios, could you tell more about your use case @rafalkrupinski?

Could we maybe somehow extend these matching functions by adding an optional regex parameter to match things? Dunno how exactly it could look like.

vytas7 avatar Aug 28 '24 08:08 vytas7

httpx.Response.json() can deserialize the body if it's JSON. The current implementation doesn't check content-type header, but imagine similar scenario

def python(self: 'Request')->dict|list|str:
  deserializers = {
    'application/json': deserialize_json,
    'application/yaml': deserialize_yaml,
   ...
  }
  match = mimeparse.best_match(deserializers.keys(), self.headers['content-type'])
  return deserializers[match](self.body)

In the above example best_match doesn't currently work reliably for suffixed types, because none of the types matches, even though we know json serializer can handle for example 'application/geo+json'.

edit: I've hopefully clarified my explanation with a piece of code.


So that's the client side, lets look at the server side.

Suppose the server can handle application/json and application/yaml resources. Client sends request with accept header application/geo+json. best_match should match it to application/json and not just because type ('application') matches.

rafalkrupinski avatar Aug 28 '24 13:08 rafalkrupinski

To be fair, Falcon itself has a hack in one place to support +json and +xml, so it is not an unreasonable request.

Right now, Falcon's media handling doesn't support this scenario either for request media, but maybe we should have an option too :thinking: (Falcon's media is very similar to your serializers and deserializers, just on the server side.)

vytas7 avatar Nov 10 '24 09:11 vytas7

@vytas7 so how do you want to continue?

rafalkrupinski avatar Nov 20 '24 21:11 rafalkrupinski

Just to be honest, myself, I don't have much incentive to introduce any new major features to this library, but we can accept your suggestions or contributions :)

The first step, I think, is to agree on a good API design.

Your proposal, as I understand, is to have an option to parse_mime_type(...) to completely disregard the subtype, and replace it with the suffix, if any is found by splitting on +?

How would that be propagated from the matching functions (are you using best_match())? And how would you handle scenarios when there is a dedicated handler for the specific media type, e.g., application/geo+json, and also for the generic application/json? Just bluntly stripping the subtype might broadcast them to the same thing, making these handlers weigh equal :thinking: Or were you thinking of a two-pass approach where you would try to match with +suffix in the case nothing could be matched without?

vytas7 avatar Nov 20 '24 22:11 vytas7