v3.2: Support ordered multipart including streaming
Fixes:
- #3721 (
multipart/mixedin general) - #3725 (
multipart/byteranges) - https://github.com/OAI/OpenAPI-Specification/discussions/4171#discussioncomment-12938067 (streaming
application/jsonwithmultipart/mixed)
This adds support for all multipart media types that do not have named parts, including support for streaming such media types. Note that multipart/mixed defines the basic processing rules for all multipart types, and implementations that encounter unrecognized multipart subtypes are required to process them as multipart/mixed. Therefore support for multipart/mixed addresses all other subtypes to some degree.
This builds on the recent support for sequential media types:
-
multipart/mixedand similar meet the definition for a sequential media type, requiring it to be modeled as an array. This does use an expansive definition of "repeating the same structure", where the structure is literally any content with a media type. - As a sequential media type, it also supports
itemSchema - Adding a parallel
itemEncodingis the obvious solution tomultipart/mixedstreams requiring an Encoding Object - We have regularly received requests to support truly mixed
multipart/mixedpayloads, and previously claimed such support from 3.0.0 onwards, without actually supporting it. AddingprefixEncodingalong withitemEncodingsupports this use case with a clear parallel toprefixItems, which is the schema construct needed to support this case. - There is no need for a
prefixSchemafield because the streaming use case requires a repetition of the same schema for each item. Therefore all mixed use cases can useschemaandprefixItems
- [X] schema changes are included in this pull request
- [ ] schema changes are needed for this pull request but not done yet
- [ ] no schema changes are needed for this pull request
We do not seem to run tests on the 3.2 schemas, and I couldn't quickly figure out how to add that, so we should do that separately and include coverage for this and other new fields.
Also paging @thecheatah, @jeremyfiel
Thanks @handrews for taking this on. I'm really happy to see it coming to fruition and hopefully the tooling catches up with it sooner than later.
I couldn't immediately make out if this would support nested multipart.
POST /things HTTP/1.1
content-type: multipart/mixed;boundary=aaa
--aaa
content-type: application/json
{
"data": ""
}
--aaa
content-type: multipart/mixed;boundary=bbb
--bbb
content-type: application/json
{
"more_data": ""
}
--bbb
content-type: text/plain
test file
--bbb
content-type: application/zip
<binary data>
---bbb
content-type: application/pdf
<binary data>
--bbb--
--aaa--
multipart/mixed:
schema:
prefixItems:
- type: object
properties:
data:
type: string
- prefixItems:
- type: object
properties:
more_data: ""
- {}
- {}
- {}
prefixEncoding:
- {}
- contentType: multipart/mixed
# not sure how to further document a nested structure here.
@jeremyfiel aww... I was hoping no one would bring up nested multipart... 😵💫
I think it would be hard to do that, because there isn't anywhere to put the nested Encoding Object. I think we'd have to add encoding, prefixEncoding, and itemEncoding to the Encoding Object as well as the Media Type Object. I'm a bit hesitant to do that, but we could talk about it at the Thursday call and I could submit it as a follow-up if it gains traction.
Alternatively, we could recommend trying that as an extension given that it adds significant complexity and is a rare case that is deprecated by the current RFC (I know that's small consolation when you're the "rare case" and built things in good faith using older RFCs when they were current).
The complexity is not just the recursive structure, but also that you are now correlating two separate trees of structure.
I'm not entirely sure this is a correct statement to include multipart/mixed. It is registered in the IANA registry and it does technically have an envelope with the boundary parameter.
Sequential Media Types
Within this specification, a sequential media type is defined as any media type that consists of a repeating structure, without any sort of header, footer, envelope, or other metadata in addition to the sequence. Some examples of sequential media types (including some that are not IANA-registered but are in common use) are:
application/jsonl application/x-ndjson application/json-seq application/geo+json-seq text/event-stream multipart/mixed
[EDIT: This goes with the nested multipart discussion]
@jeremyfiel the problem is that instead of just re-using the Media Type Object, we came up with the contentType field :-(
I totally understand the complexity, just trying to confirm my initial impression.
@jeremyfiel That statement only says that some of the listed types are not registered. application/json-seq, application/geo+json-seq, and multipart/mixed are all registered.
I decided not to get into the preamble and postamble of multipart because AFAICT they're supposed to be ignored and are there for historical purposes. Media type parameters are not part of the actual media type content, and the boundaries in the content are no more (or less) significant than the various differences in the three sequential JSON media type delimiters.
@jeremyfiel I added some clarifications about the envelope/preamble/epilogue and the lack of nesting support.
This force-push was just a plain re-base with no conflicts or other changes. Exactly the same commits applied, I just wanted to make sure the other big PRs wouldn't cause merge issues.
@jeremyfiel GitHub won't let me request a review from you, but if you could provide an approval when you are satisfied with the PR it would be much appreciated as you probably have more expertise with this than just about anyone else.
@thecheatah if you are able to review, even just for the streaming support part, that would also be greatly appreciated. I did not use application/json in the streaming multipart example, but the principle would be the same.
@ralfhandl I have fixed the sentence ordering, and also added a new section, Encoding and type [see commits below- initial push failed and I didn't notice at first], that clarifies how to handle detecting the "schema type" that gets mentioned in many places but is never explained. I suspect that originally the expectation was that the Schema Object under the schema field in the Media Type Object (adjacent to the Encoding Objects' parent encoding field) would be an inline schema with inline properties subschemas, or at most a single $ref directly under schema.
This is not realistic, so I think that setting some boundaries on whatis expected is required. I stuck with requiring (MUST) only the most unambiguous scenario, although I considered including the search-order support for multi-valued type keywords as a MUST. Or at least if it is two values and the second value is "null". But you can get into some weird corner cases when you are doing that so it felt better to banish it under the "MAY choose to implement more complex things." Although I also considered pulling that part out of the MAY and making it a SHOULD on its own. Opinions here would be much appreciated.
This, btw, is a prerequisite for supporitng nested multipart/mixed as the problem becomes much more complex the further down you go in nested objects and arrays. But this is really needed already, whether we support nesting or not, so I decided to add it to this PR.
@ralfhandl oops, push had failed and I hadn't noticed. The section mentioned in my last comment is now actually added, sorry about that.
@ralfhandl added more tests, merged your suggestions (and yeah "falling back" is probably too idiomatic to be used here)
Also rebased to pick up the test-running changes. No modifications in the rebase force-push
The most recent commit improves one example based on this comment, and also adds some Security Considerations related to it.
As this has grown with more guidance and examples, plus many comment threads, it has gotten unwieldy. I have split it into the following PRs to replace this one:
- #4743 (an improved approach to analyzing schemas for
typeand other things, as well as handling mixed JSON/binary data in schema evaluation, which is more closely related than it sounds) - #4744 (guidance for when
contentTypeis a list, since as @jeremyfiel noted, that brings up some complexities that we have never addressed, and it will come up more with orderedmultipartas seen in one of the examples - #4745 (all of the content of this PR that is neither an example nor part of the two previous PRs in this list- no changes other than splitting it up)
- #4746 (the examples, which are provoking involved enough discussions that they should be separate, plus they will be affected by pending Example Object changes so it's easier to manage them separately- includes the recently revised
multipart/relatedexample that was particularly provocative)