mcap
mcap copied to clipboard
Proposal for MIME type encoding
Public-Facing Changes
This PR proposes a new well-known encoding to the spec for MIME types.
Description
Channels that pass data without a schema may contain content with a known MIME type. This PR proposes mime
as a supported channel message encoding, with the schema encoding name referencing the MIME type/subtype directly.
There may be other ways to achieve this that are preferred, but this seemed like a good way to start a conversation about it.
I think the “binary data” reference is confusing this a bit. Whether data is binary or utf8 or ascii doesn’t change the discussion.
That nit aside, the way I’ve been thinking about self-describing schemaless data such as h264 video is there would be no schema at all, and the channel encoding would be the IANA-registered MIME type.
I guess in my mind the message encoding is the mime type, rather than the message encoding being mime
. The message encoding should tell you what format the binary data in that message is.
I think we originally tried using actual mime types for the existing standardized message encodings, I can't remember why that idea got dropped (probably because there is no mime type for "ros 1 message"). But it seems reasonable to add some other well known ones for h264, jpg, etc.
As @jhurliman mentioned, there is no need for a schema for those formats (h264, jpg) because they are already self describing.
Both of your comments helped me understand the difference between the channel message encoding and schema encoding better; thanks for that! It makes sense that the channel message encoding could be sufficient here. I guess I tried to use the schema to help disambiguate between the message encodings which aren't mime types and those that are---but that might not be necessary.
Yeah - it's probably not clear from the current set of recommended channel/schema encodings, but schemas are intended to be optional. For self-describing messages like json or jpeg there is no need for a schema.
The reason that channel and schema encodings are specified separately is that they don't always match 1:1. For example:
-
json
messages might use either https://json-schema.org/ or https://typeschema.org/ to define their schema. -
cdr
messages (ROS 2) might use eitherros2msg
orros2idl
to define their schema
Those combinations aren't supported in Foxglove Studio today, but we wanted the flexibility.
Some thoughts:
The purpose of the message_encoding field is to tell you what binary serialization format the message data is in - specially with the goal of "how to deserialize it". Sometimes message_encoding
is still not enough (for example protobuf), and you need an additional schema. In mcap files, the pairing of schema+message_encoding should be sufficient to deserialize the message data now and forever.
We could consider mime
as one of those situations if we use mime
for message_encoding
- but then what type of Schema record do you pair it with? We could say that the schema_encoding would be mime
or text/plain
and the _schema_
is image/jpeg
. Tho what would be the name
? That's one approach we could take.
The other is to leverage the schema-less feature of channel records and use a schema id of 0
. Then the message_encoding
would need to be something like image/jpeg
or mime:image/jpeg
whatever we decide.
@defunctzombie yeah, your second paragraph was my reasoning for the PR as it was initially proposed. Since "mime" isn't an encoding I agree with the above consensus that a schema isn't necessary; the mime type itself should be sufficient as the message encoding. I do like the mime:
prefix. I guess "media types" is the currently preferred nomenclature though: https://www.iana.org/assignments/media-types/media-types.xhtml
Media Types (formerly known as MIME types)
so perhaps media:
as a prefix? All mime types are of the form type/subtype
; there's always a slash. Potentially that is enough?
I'm in favor of just dropping the mime:
/ media:
prefix, and specifying in the spec that implementations should interpret any unknown message encoding as a media type.
I think the only reason we didn't just use media types to specify message encoding is that there are none registered for the initial encodings we wanted to support (ros 1, ros 2 cdr, protobuf, flatbuffer) - only json is registered. But using them going forward seems sensible.
implementations should interpret any unknown message encoding as a media type
This seems like a strange fallback. What if next year we add support for a ros3
message encoding? Old tools will treat that as "unknown media type"?
They would treat it like an unknown type, same as they do today.
If you want to be more specific, we could say that any message encoding containing a /
is assumed to be a media type, and others should come from our shorthand list.
But also, if we add ros3
next year maybe we should just use media type syntax going forward and use something like application/x-ros3-msg
?
They would treat it like an unknown type, same as they do today.
I guess in a world where we use the media type syntax for all types, that would make sense.
Would we use media types for both message encoding and schema encoding (when a schema is used)?
maybe we should just use media type syntax going forward and use something like
application/x-ros3-msg
?
I thought x-
wasn't a thing anymore 😅 https://www.rfc-editor.org/rfc/rfc6648.html
I thought x- wasn't a thing anymore 😅 https://www.rfc-editor.org/rfc/rfc6648.html
It seems like their answer is x-
is no longer necessary because we made the registration process easier. So if we are to copy that model, we should similarly recommend against using non-standard encodings, and instead encourage users to register any custom type they are using in our appendix (possibly with a vnd.
prefix if it is company-specific).
Would we use media types for both message encoding and schema encoding (when a schema is used)?
I mean in theory what we are trying to do is already solved by media types. In practice, there is no registered media type for "protobuf filedescriptorset" or "jsonschema" or "concatenated ros1 msg files".
So it seems like we either need to go and register a bunch of media types, or we need to have some "override" shorthand values that are not registered media types.
Turning this into a concrete proposal, we could say something like:
The message_encoding and schema_encoding must be interpreted as either (a) if it does not contain a forwardslash, a well known encoding registered in our spec appendix, or (b) if it contains a forwardslash, a well known media type. Implementers are free to put non-standard data in the message or schema encoding fields, but are strongly encouraged to register their string in one of these two databases.
Thoughts?
Some prior art from gRPC, a typical call uses application/grpc+proto
(not registered with IANA).
from https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md#requests:
Content-Type → "content-type" "application/grpc" [("+proto" / "+json" / {custom})]
Closing now that #563 has landed
@jhurliman is #563 separate from the ask here? #563 is a rename of our existing language. This PR wanted to explore expanding the spec to allow media_type as the message encoding for channels.
@alkasm as your use of mcap has evolved do you still think this issue is worth exploring?
@defunctzombie you're right that the attachment field name is not the same as this issue.
For now, we have standardized on accepting a mimetype as the message encoding, with no schema encoding, i.e. language similar to:
- Channel message_encoding: MUST be one of protobuf, json, or
- Schema encoding: MUST be one of protobuf or "" (empty string)
We're primarily using the mimetype for imagery or video data, e.g. video/h264 or image/jpeg, and also some raw data streams come through as application/octet-stream.
Would it be worth adding to our spec appendix that well known media types are explicitly allowed in the message encoding field? E.g. image/jpeg
seems like a no-brainer to me.
video/h264
I also think would be worth explicitly stating in the appendix how we expect it to be stored with respect to timestamps.
video/h264
I also think would be worth explicitly stating in the appendix how we expect it to be stored with respect to timestamps.
Not just timestamps - but also which format (annexb or avcc) and how many NAL packets. I can't say with certainty since I've not done enough research on it but my quick read of the video/h264
media type does not lead me to think it is sufficient as the message_encoding
value.
In my experiments making a web viewer for h264 data in an mcap file I used the following message encodings which I would assume we'd define in the mcap well-known spec. We could do the same for video/h264
but that might not align 100% with media type video/h264
.
![image](https://user-images.githubusercontent.com/84792/194122339-fb23290e-bbb6-4883-b2d3-98da1d6c689e.png)
That aside - I do think there is value in being clear in the spec about media type use within message_encoding. It seems like a nice way to allow for storing images and other well-known formats as messages.
We would need to be careful with the wording, "explicitly allowed" is not quite right because we don't disallow strings that are not IANA registered media types. Maybe provide an example of using image/jpeg
with schema_id=0 to convey that this is a good practice.
For video, I think we need to keep researching and provide a working proof of concept before adding to the spec. We need to answer whether video/h264
is sufficient, or if it should be video/h264; codecs="avc1.4d002a"
, or something even more specific (ex: messages contain NAL Access Units in Annex-B format where Decode Order equals Output Order, i.e. no B-frames).