zarr-specs
zarr-specs copied to clipboard
mime type / encoding format conventions
we are trying to include some type information in our jsonld descriptors of a zarr asset. i could not find a search response to a mime type for zarr. would application/x-zarr be appropriate?
I don't recall how we got there but we've used application/vnd+zarr in the past.
Hi all - Unidata and the netCDF community is working on registering the application/netcdf media type with IANA (see netCDF GH Issue 42). Here are a few notes on the registration process in case it is useful.
The process for registering a media type with IANA (defined in RFC 6838) has an unregistered namespace that "may be used for [media] types intended exclusively for use in private, local environments". The sub-type in the unregistered namespace/tree is prefixed with a “x.”, which replaces the older “x-” prefix.
The vender tree/namespace (prefixed with “vnd.”) is used for "media types associated with publicly available products". A suffix starting with “+” has a special meaning in IANA media type names. So, application/vnd.zarr would fit the IANA model better than application/vnd+zarr. Vendor tree media types need to be registered, but registration and review is light weight compared to the standards tree.
The standards tree (no prefix) is intended for “[media] types of general interest to the Internet community”. Media types registered in the standard tree must either be:
- “in the case of registrations associated with IETF specifications, approved directly by the IESG”
- “registered by a recognized standards-related organization” (IESG makes a one-time decision on whether the submitter represents a recognized standards-related organization). This option also requires a well defined specification for the media type.
Registration on the full standards tree registry can take some time and effort. However, there is a provisional registration process available to facilitate prototyping and testing. The main hurdle for provisional registration is getting recognized as a “standards-related organization”. There are a number of standards and steering committees that are recognized as such. So, if Zarr decides to register on the standards tree, the Zarr Steering Committee might be the entity to get recognized.
This is as far as we’ve gotten for netCDF (application/netcdf is listed on the provisional standard media type registry). So I don’t yet know the details of the review part of the full registration process.
@satra, for which files are you thinking of adding a mimetype? The fact that there are multiple makes this an interesting problem. e.g. if someone downloads a chunk and learns that it's "application/zarr" or whatever, what can they do with that without the rest of the fileset?
I don't recall how we got there but we've used
application/vnd+zarrin the past.
@jhamman, you use this for each .zgroup, .zarray and .zattrs file? Conceivably these could also have a prominent "json" in the mimetype.
@jhamman, you use this for each .zgroup, .zarray and .zattrs file? Conceivably these could also have a prominent "json" in the mimetype.
So, we're using application/vnd+zarr as the asset media type in the STAC context where an asset is represented as a path that points to a directory that contains a .zgroup. We are not using the media types to represent the types of metadata or data objects within a zarr dataset.
for which files are you thinking of adding a mimetype?
@joshmoore - same as @jhamman . in our archive we are using nesteddirectorystore hosted on s3 as an asset. only the top level path (e.g., /path/to/somename.ngff) in our database returns this mime-type within the metadata record, not the individual files underneath. we left our implementation for now with application/x-zarr with the possibility of converging on whatever consensus emerges.
I just saw on a webinar from @bilts that NASA Harmony is using the mime type application/x-zarr for Zarr assets.
Quote: A media type consists of a type and a subtype, which is further structured into a tree. A media type can optionally define a suffix and parameters:
type "/" [tree "."] subtype ["+" suffix]* [";" parameter]
Excerpts from a partial read of https://www.rfc-editor.org/rfc/rfc6838.html:
- 3.4. Unregistered x. Tree . . . . . . . . . . . . . . . . . . . 7
- "Subtype names with "x." as the first facet may be used for types intended exclusively for use in private, local environments." (As well as "discouraged" and "x-" is not considered part of this tree.)
- 4.2.5. Application Media Types . . . . . . . . . . . . . . . 11
- "The "application" top-level type is to be used for discrete data that do not fit under any of the other type names, and particularly for data to be processed by some type of application program."
- 4.2.8. Structured Syntax Name Suffixes . . . . . . . . . . . 12
- "Since this was published, the de facto practice has arisen for using this suffix convention for other well-known structuring syntaxes. In particular, media types have been registered with suffixes such as "+der", "+fastinfoset", and "+json". This specification formalizes this practice and sets up a registry for structured type name suffixes."
Based on these, my general thoughts are:
- To avoid
x[.-]if possible. - We might well be
vnd.since "industry consortia as well as non-commercial entities that do not qualify as recognized standards-related organizations can quite appropriately register media types in the vendor tree." but I think we could go for one of the other trees. application/zarrcertainly seems to be a natural fit especially since it's unlikely that too much can be done with the entity without the proper application, but- I could also see getting behind use of
+zarrso that the main intent of the entity could be expressed with another mimetype,image+zarrorapplication/zip+zarr. The document for that is Structured Syntax Suffixes. Another current example is+sqlite, which is defined to matchapplication/vnd.sqlite3. - Finally there's going whole hog a la RFC 7303 - "XML Media Types"
ping @yarikoptic
Is there any precedent for using mime types to refer to directory trees as opposed to individual files?
there have been several efforts : https://www.w3.org/2002/12/cal/rfc2425.html
and various vendor specific things including directories on android: vnd.android.cursor.dir
but nothing looking at the type of directory based stores that we are considering here.
Is there any precedent for using mime types to refer to directory trees as opposed to individual files?
FWIW I thought to check what http://github.com/file/file (libmagic) thinks -- looking at source and running (on linux) I think all directories are just inode/directory and I don't even see that one among iana.../...media-types.xhtml.
- I could also see getting behind use of
+zarrso that the main intent of the entity could be expressed with another mimetype,image+zarrorapplication/zip+zarr. The document for that is Structured Syntax Suffixes. Another current example is+sqlite, which is defined to matchapplication/vnd.sqlite3.
I wonder if it shouldn't be the other way around, i.e. have /zarr and then possibly the +suffix (e.g., +zip assuming that +directory is like a default.) rfc6838 ref on suffixes
examples from media-types
$> curl --silent https://www.iana.org/assignments/media-types/media-types.xhtml | grep 'application.*+zip'
<a href="application/bacnet-xdd+zip">application/bacnet-xdd+zip</a>
<a href="application/epub+zip">application/epub+zip</a>
<a href="application/lpf+zip">application/lpf+zip</a>
<a href="application/p21+zip">application/p21+zip</a>
<a href="application/prs.hpub+zip">application/prs.hpub+zip</a>
<a href="application/vnd.comicbook+zip">application/vnd.comicbook+zip</a>
<a href="application/vnd.d2l.coursepackage1p0+zip">application/vnd.d2l.coursepackage1p0+zip</a>
<a href="application/vnd.espass-espass+zip">application/vnd.espass-espass+zip</a>
<a href="application/vnd.etsi.asic-s+zip">application/vnd.etsi.asic-s+zip</a>
<a href="application/vnd.etsi.asic-e+zip">application/vnd.etsi.asic-e+zip</a>
<a href="application/vnd.exstream-empower+zip">application/vnd.exstream-empower+zip</a>
<a href="application/vnd.familysearch.gedcom+zip">application/vnd.familysearch.gedcom+zip</a>
<a href="application/vnd.ficlab.flb+zip">application/vnd.ficlab.flb+zip</a>
<a href="application/vnd.gov.sk.e-form+zip">application/vnd.gov.sk.e-form+zip</a>
<a href="application/vnd.imagemeter.folder+zip">application/vnd.imagemeter.folder+zip</a>
<a href="application/vnd.imagemeter.image+zip">application/vnd.imagemeter.image+zip</a>
<a href="application/vnd.iso11783-10+zip">application/vnd.iso11783-10+zip</a>
<a href="application/vnd.logipipe.circuit+zip">application/vnd.logipipe.circuit+zip</a>
<a href="application/vnd.maxar.archive.3tz+zip">application/vnd.maxar.archive.3tz+zip</a>
I wonder if it shouldn't be the other way around, i.e. have
/zarrand then possibly the+suffix(e.g.,+zipassuming that+directoryis like a default.) rfc6838 ref on suffixes
:+1: I could see that. Though I think the +zarr as with +sqlite3 or +zip could still be useful even if we want to target application/[vnd.]zarr for most cases. Though perhaps the fact that only one suffix is intended could come back to bite us.
@satra It appears that both of those examples, https://www.w3.org/2002/12/cal/rfc2425.html proposing a text/directory mime type, and vnd.android.cursor.dir, logically represent some sort of collection of items, but are in fact still represented as a single file or byte stream.
Note: application/zip+zarr would correspond to a single file (the zip file) so there is no issue there.
I can see the benefit of using a mime type if you have an existing database where things are identified by mime types. But my understanding is that so far mime types have been limited to identifying the format of a single file / byte stream. We may want to be careful in using mime types outside of their normal scope --- and perhaps at least see if this is something that has been done before.