OpenAPI-Specification icon indicating copy to clipboard operation
OpenAPI-Specification copied to clipboard

v3.2: Support ordered multipart including streaming

Open handrews opened this issue 9 months ago • 12 comments

Fixes:

  • #3721 (multipart/mixed in general)
  • #3725 (multipart/byteranges)
  • https://github.com/OAI/OpenAPI-Specification/discussions/4171#discussioncomment-12938067 (streaming application/json with multipart/mixed)

This adds support for all multipart media types that do not have named parts, including support for streaming such media types. Note that multipart/mixed defines the basic processing rules for all multipart types, and implementations that encounter unrecognized multipart subtypes are required to process them as multipart/mixed. Therefore support for multipart/mixed addresses all other subtypes to some degree.

This builds on the recent support for sequential media types:

  • multipart/mixed and similar meet the definition for a sequential media type, requiring it to be modeled as an array. This does use an expansive definition of "repeating the same structure", where the structure is literally any content with a media type.
  • As a sequential media type, it also supports itemSchema
  • Adding a parallel itemEncoding is the obvious solution to multipart/mixed streams requiring an Encoding Object
  • We have regularly received requests to support truly mixed multipart/mixed payloads, and previously claimed such support from 3.0.0 onwards, without actually supporting it. Adding prefixEncoding along with itemEncoding supports this use case with a clear parallel to prefixItems, which is the schema construct needed to support this case.
  • There is no need for a prefixSchema field because the streaming use case requires a repetition of the same schema for each item. Therefore all mixed use cases can use schema and prefixItems
  • [X] schema changes are included in this pull request
  • [ ] schema changes are needed for this pull request but not done yet
  • [ ] no schema changes are needed for this pull request

We do not seem to run tests on the 3.2 schemas, and I couldn't quickly figure out how to add that, so we should do that separately and include coverage for this and other new fields.

Also paging @thecheatah, @jeremyfiel

handrews avatar May 15 '25 18:05 handrews

Thanks @handrews for taking this on. I'm really happy to see it coming to fruition and hopefully the tooling catches up with it sooner than later.

I couldn't immediately make out if this would support nested multipart.

POST  /things HTTP/1.1
content-type: multipart/mixed;boundary=aaa

--aaa
content-type: application/json

{ 
   "data": ""
}
--aaa
content-type: multipart/mixed;boundary=bbb

        --bbb
        content-type: application/json
        {
            "more_data": ""
        }
        --bbb
        content-type: text/plain
        test file
        --bbb
        content-type: application/zip
        
        <binary data>
        ---bbb
        content-type: application/pdf
        
        <binary data>
        --bbb--
--aaa--

multipart/mixed:
  schema:
     prefixItems:
     -  type: object
         properties:
           data:
             type: string
     - prefixItems:
        - type: object
           properties:
              more_data: ""
        - {}
        - {}
        - {}
    prefixEncoding:
      - {}
      - contentType: multipart/mixed
      # not sure how to further document a nested structure here.

jeremyfiel avatar May 27 '25 20:05 jeremyfiel

@jeremyfiel aww... I was hoping no one would bring up nested multipart... 😵‍💫

I think it would be hard to do that, because there isn't anywhere to put the nested Encoding Object. I think we'd have to add encoding, prefixEncoding, and itemEncoding to the Encoding Object as well as the Media Type Object. I'm a bit hesitant to do that, but we could talk about it at the Thursday call and I could submit it as a follow-up if it gains traction.

Alternatively, we could recommend trying that as an extension given that it adds significant complexity and is a rare case that is deprecated by the current RFC (I know that's small consolation when you're the "rare case" and built things in good faith using older RFCs when they were current).

The complexity is not just the recursive structure, but also that you are now correlating two separate trees of structure.

handrews avatar May 27 '25 20:05 handrews

I'm not entirely sure this is a correct statement to include multipart/mixed. It is registered in the IANA registry and it does technically have an envelope with the boundary parameter.

Sequential Media Types

Within this specification, a sequential media type is defined as any media type that consists of a repeating structure, without any sort of header, footer, envelope, or other metadata in addition to the sequence. Some examples of sequential media types (including some that are not IANA-registered but are in common use) are:

  application/jsonl
  application/x-ndjson
  application/json-seq
  application/geo+json-seq
  text/event-stream
  multipart/mixed

jeremyfiel avatar May 27 '25 20:05 jeremyfiel

[EDIT: This goes with the nested multipart discussion]

@jeremyfiel the problem is that instead of just re-using the Media Type Object, we came up with the contentType field :-(

handrews avatar May 27 '25 20:05 handrews

I totally understand the complexity, just trying to confirm my initial impression.

jeremyfiel avatar May 27 '25 20:05 jeremyfiel

@jeremyfiel That statement only says that some of the listed types are not registered. application/json-seq, application/geo+json-seq, and multipart/mixed are all registered.

I decided not to get into the preamble and postamble of multipart because AFAICT they're supposed to be ignored and are there for historical purposes. Media type parameters are not part of the actual media type content, and the boundaries in the content are no more (or less) significant than the various differences in the three sequential JSON media type delimiters.

handrews avatar May 27 '25 20:05 handrews

@jeremyfiel I added some clarifications about the envelope/preamble/epilogue and the lack of nesting support.

handrews avatar May 27 '25 20:05 handrews

This force-push was just a plain re-base with no conflicts or other changes. Exactly the same commits applied, I just wanted to make sure the other big PRs wouldn't cause merge issues.

@jeremyfiel GitHub won't let me request a review from you, but if you could provide an approval when you are satisfied with the PR it would be much appreciated as you probably have more expertise with this than just about anyone else.

@thecheatah if you are able to review, even just for the streaming support part, that would also be greatly appreciated. I did not use application/json in the streaming multipart example, but the principle would be the same.

handrews avatar May 30 '25 17:05 handrews

@ralfhandl I have fixed the sentence ordering, and also added a new section, Encoding and type [see commits below- initial push failed and I didn't notice at first], that clarifies how to handle detecting the "schema type" that gets mentioned in many places but is never explained. I suspect that originally the expectation was that the Schema Object under the schema field in the Media Type Object (adjacent to the Encoding Objects' parent encoding field) would be an inline schema with inline properties subschemas, or at most a single $ref directly under schema.

This is not realistic, so I think that setting some boundaries on whatis expected is required. I stuck with requiring (MUST) only the most unambiguous scenario, although I considered including the search-order support for multi-valued type keywords as a MUST. Or at least if it is two values and the second value is "null". But you can get into some weird corner cases when you are doing that so it felt better to banish it under the "MAY choose to implement more complex things." Although I also considered pulling that part out of the MAY and making it a SHOULD on its own. Opinions here would be much appreciated.

This, btw, is a prerequisite for supporitng nested multipart/mixed as the problem becomes much more complex the further down you go in nested objects and arrays. But this is really needed already, whether we support nesting or not, so I decided to add it to this PR.

handrews avatar Jun 09 '25 17:06 handrews

@ralfhandl oops, push had failed and I hadn't noticed. The section mentioned in my last comment is now actually added, sorry about that.

handrews avatar Jun 09 '25 17:06 handrews

@ralfhandl added more tests, merged your suggestions (and yeah "falling back" is probably too idiomatic to be used here)

handrews avatar Jun 13 '25 16:06 handrews

Also rebased to pick up the test-running changes. No modifications in the rebase force-push

handrews avatar Jun 13 '25 16:06 handrews

The most recent commit improves one example based on this comment, and also adds some Security Considerations related to it.

handrews avatar Jun 18 '25 22:06 handrews

As this has grown with more guidance and examples, plus many comment threads, it has gotten unwieldy. I have split it into the following PRs to replace this one:

  • #4743 (an improved approach to analyzing schemas for type and other things, as well as handling mixed JSON/binary data in schema evaluation, which is more closely related than it sounds)
  • #4744 (guidance for when contentType is a list, since as @jeremyfiel noted, that brings up some complexities that we have never addressed, and it will come up more with ordered multipart as seen in one of the examples
  • #4745 (all of the content of this PR that is neither an example nor part of the two previous PRs in this list- no changes other than splitting it up)
  • #4746 (the examples, which are provoking involved enough discussions that they should be separate, plus they will be affected by pending Example Object changes so it's easier to manage them separately- includes the recently revised multipart/related example that was particularly provocative)

handrews avatar Jun 21 '25 01:06 handrews