Specify and Register Media Type(s)
Neither the spec docs nor the Java source include recommended media type(s) for Ion (e.g. application/ion [; charset=[utf-8, utf-16, utf-32]]). A media type should be standardized and registered with IANA.
Proposal for IANA submission:
Type name: application Subtype name: ion Required parameters: none Optional parameters: charset
See http://amznlabs.github.io/ion-docs/text.html. Valid encodings are UTF-8, UTF-16, and UTF-32 for text encoded Ion. Binary encoded Ion must not specific a charset. The default charset value is UTF-8.
Encoding considerations: Ion may be encoded in either text (as UTF-8, UTF-16, or UTF-32 encoded unicode code points) or binary, either of which may be compressed in a Gzip wrapper. When text encoding is used the charset parameter may be used to specify the character encoding.
Security considerations: Ion data may contains applications specific S-Expressions which are interpreted at the discretion of consumers. This encoded data poses no security risk, but bugs in consuming applications may provide vectors for security risks.
Interoperability considerations: Ion has been shown to be interoperable across applications and platforms for import and export from several implementations. In practice, binary and UTF-8 encoded text Ion (both gzipped and not) are the most widely used and tested encodings.
Published specification: http://amznlabs.github.io/ion-docs/spec.html
Applications that use this media type: Ion is device-, platform-, and vendor-neutral and is supported by generic and task-specific applications and a wide range of generic Ion tools (editors, parsers, Web agents, …).
Additional information:
- Magic number(s): Binary encoded Ion datagrams begin with a binary version marker consisting of the sequence
e0 01 00 ea. Gzipped Ion begins with the sequence1f 8b. Text encoded Ion may contain a Unicode BOM to indicate its encoding. - File extension(s): .ion, .10n, .ion.gz, .10n.gz
- Macintosh file type code(s): TEXT (for text encoded ion), ION (for any)
Person and email address for further information: pending
Intended usage: COMMON
Restrictions on usage: none
Author: pending
Change controller: The Ion specification is a work product of Amazon.com, Inc.
Jon, this is really awesome--I think we should definitely register the media type.
I would like us to consider using this draft as the basis for a submission, thus, as discussed separately, we would need you to sign a CLA (the ball is in our court, I'm working through the process).
In absence of a registered type, perhaps application/vnd.amazon.ion?
Quick update: I've submitted this registration to IANA ([IANA #967506])
Name: Jonathan Hohle
Email: [email protected]
Media type name: application
Media subtype name: ion
Required parameters: N/A
Optional parameters:
charset:
See http://amznlabs.github.io/ion-docs/text.html. Valid encodings are UTF-8, UTF-16, and UTF-32 for text encoded Ion. Binary encoded Ion must not specific a charset. The default charset value is UTF-8.
Encoding considerations: binary
See http://amznlabs.github.io/ion-docs/text.html. Valid encodings are UTF-8, UTF-16, and UTF-32 for text encoded Ion. Binary encoded Ion must not specific a charset. The default charset value is UTF-8.
Ion may be encoded in either text (as UTF-8, UTF-16, or UTF-32 encoded unicode code points) or binary, either of which may be compressed in a Gzip wrapper. When text encoding is used the charset parameter may be used to specify the character encoding.
Security considerations:
Ion data may contains applications specific S-Expressions which are interpreted at the discretion of consumers. This encoded data poses no security risk, but bugs in consuming applications may provide vectors for security risks.
Interoperability considerations:
Ion has been shown to be interoperable across applications and platforms for import and export from several implementations. In practice, binary and UTF-8 encoded text Ion (both gzipped and not) are the most widely used and tested encodings.
Published specification:
http://amznlabs.github.io/ion-docs/spec.html
Applications which use this media:
Ion is device-, platform-, and vendor-neutral and is supported by generic and task-specific applications and a wide range of generic Ion tools (editors, parsers, Web agents, …).
Fragment identifier considerations:
N/A
Restrictions on usage:
N/A
Provisional registration? (standards tree only):
This is the initial registration for application/ion. During registration clients choose to use application/vnd.amazon.ion, however, this is not part of or defined in the standard.
Additional information:
1. Deprecated alias names for this type: N/A
2. Magic number(s): 0xe0 0x01 0x00 0xea
3. File extension(s): .ion, .10n, .ion.gz, .10n.gz
4. Macintosh file type code: TEXT
5. Object Identifiers: N/A
General Comments:
Binary encoded Ion datagrams begin with a binary version marker consisting of the sequence e0 01 00 ea. Gzipped Ion begins with the sequence 1f 8b. Text encoded Ion may contain a Unicode BOM to indicate its encoding.
Ion issue tracking this registration: https://github.com/amzn/ion-java/issues/82
Person to contact for further information:
1. Name: Jonathan Hohle
2. Email: [email protected]
Intended usage: Common
From the spec:
Amazon Ion is a richly-typed, self-describing, hierarchical data serialization format offering interchangeable binary and text representations. The text format (a superset of JSON) is easy to read and author, supporting rapid prototyping. The binary representation is efficient to store, transmit, and skip-scan parse. The rich type system provides unambiguous semantics for long-term preservation of business data which can survive multiple generations of software evolution.
Ion was built to solve the rapid development, decoupling, and efficiency challenges faced every day while engineering large-scale, service-oriented architectures. Ion has been addressing these challenges within Amazon for nearly a decade, and we believe others will benefit as well.
Author/Change controller: The Ion specification is a work product of Amazon.com, Inc.
I received a reply from IANA. My impression is that an RFC will be required for inclusion in the standard tree, but a lower barrier of entry may be possible for the vendor tree (for, as @benkehoe noted, application/vnd.amazon.ion).
I'm going to attempt to squeeze and shuffle the existing documentation into an RFC document. I'll reply when I have something ready for review within the Ion community.
Did this go anywhere?
I spent a lot of time converting the Ion spec docs to RFC formatted troff, but then life got in the way and I was unable to continue. I've pushed the current state of that work here https://github.com/hohle/ion-rfc. This is a snapshot of the spec contents from 2017, so there may be sections that are out of date.
It appears application/ion is still included in the Provisional Standard Media Type Registry. I'm happy to continue the process if others are willing to review the docs.
Valid encodings are UTF-8, UTF-16, and UTF-32 for text encoded Ion. Binary encoded Ion must not specific a charset. The default charset value is UTF-8.
Encoding considerations: Ion may be encoded in either text (as UTF-8, UTF-16, or UTF-32 encoded unicode code points) or binary, either of which may be compressed in a Gzip wrapper. When text encoding is used the charset parameter may be used to specify the character encoding.
This seems like it primarily focused on the semantics of the application/ion media type with respect to the Content-type of a message. If the Content-type does not include a charset then the content could be either text or binary Ion, but the Ion encoding is not ambiguous since the text vs binary distinction can be determined by inspecting the first few bytes of the content.
However, this seems like it may need further clarification with respect to the Accept header and content negotiation. Specifically, Accept: application/ion does not seem to indicate whether text or binary is preferred, and since this refers to content that has not been sent yet, we cannot use the first few bytes of the content to disambiguate.
If I had to guess, I'd assume that including a charset in the Accept header tells the server that the client wants Ion text, but it's not clear whether Accept: application/ion means that the client is willing to accept any Ion encoding or that it specifically wants binary.
Maybe I'm seeing a problem where there is none. (Forgive me, if that's the case.) However, if it is an issue then we should probably specify something about this.
Valid encodings are UTF-8, UTF-16, and UTF-32 for text encoded Ion. Binary encoded Ion must not specific a charset. The default charset value is UTF-8. Encoding considerations: Ion may be encoded in either text (as UTF-8, UTF-16, or UTF-32 encoded unicode code points) or binary, either of which may be compressed in a Gzip wrapper. When text encoding is used the charset parameter may be used to specify the character encoding.
This seems like it primarily focused on the semantics of the
application/ionmedia type with respect to theContent-typeof a message. If theContent-typedoes not include acharsetthen the content could be either text or binary Ion, but the Ion encoding is not ambiguous since the text vs binary distinction can be determined by inspecting the first few bytes of the content.However, this seems like it may need further clarification with respect to the
Acceptheader and content negotiation. Specifically,Accept: application/iondoes not seem to indicate whether text or binary is preferred, and since this refers to content that has not been sent yet, we cannot use the first few bytes of the content to disambiguate.If I had to guess, I'd assume that including a charset in the
Acceptheader tells the server that the client wants Ion text, but it's not clear whetherAccept: application/ionmeans that the client is willing to accept any Ion encoding or that it specifically wants binary.Maybe I'm seeing a problem where there is none. (Forgive me, if that's the case.) However, if it is an issue then we should probably specify something about this.
In practice, I'm not sure there is a problem unless there are systems that don't support the cross set of Binary/Text+Compressed/Uncompressed Ion. I've not encountered any implementations that do not support some way of providing a byte stream that contains any of those without prior knowledge of the encoding. My preference would be to not over specify, if possible.
I've uploaded an updated gist. I'm going to do a few more rounds of proof reading to catch my own errors and make sure this is up to date with ion-docs, to the best of my ability.