zarr-specs icon indicating copy to clipboard operation
zarr-specs copied to clipboard

mime type / encoding format conventions

Open satra opened this issue 3 years ago • 13 comments

we are trying to include some type information in our jsonld descriptors of a zarr asset. i could not find a search response to a mime type for zarr. would application/x-zarr be appropriate?

satra avatar Jan 07 '22 17:01 satra

I don't recall how we got there but we've used application/vnd+zarr in the past.

jhamman avatar Jan 10 '22 01:01 jhamman

Hi all - Unidata and the netCDF community is working on registering the application/netcdf media type with IANA (see netCDF GH Issue 42). Here are a few notes on the registration process in case it is useful.

The process for registering a media type with IANA (defined in RFC 6838) has an unregistered namespace that "may be used for [media] types intended exclusively for use in private, local environments". The sub-type in the unregistered namespace/tree is prefixed with a “x.”, which replaces the older “x-” prefix.

The vender tree/namespace (prefixed with “vnd.”) is used for "media types associated with publicly available products". A suffix starting with “+” has a special meaning in IANA media type names. So, application/vnd.zarr would fit the IANA model better than application/vnd+zarr. Vendor tree media types need to be registered, but registration and review is light weight compared to the standards tree.

The standards tree (no prefix) is intended for “[media] types of general interest to the Internet community”. Media types registered in the standard tree must either be:

  1. “in the case of registrations associated with IETF specifications, approved directly by the IESG”
  2. “registered by a recognized standards-related organization” (IESG makes a one-time decision on whether the submitter represents a recognized standards-related organization). This option also requires a well defined specification for the media type.

Registration on the full standards tree registry can take some time and effort. However, there is a provisional registration process available to facilitate prototyping and testing. The main hurdle for provisional registration is getting recognized as a “standards-related organization”. There are a number of standards and steering committees that are recognized as such. So, if Zarr decides to register on the standards tree, the Zarr Steering Committee might be the entity to get recognized.

This is as far as we’ve gotten for netCDF (application/netcdf is listed on the provisional standard media type registry). So I don’t yet know the details of the review part of the full registration process.

ethanrd avatar Jan 11 '22 17:01 ethanrd

@satra, for which files are you thinking of adding a mimetype? The fact that there are multiple makes this an interesting problem. e.g. if someone downloads a chunk and learns that it's "application/zarr" or whatever, what can they do with that without the rest of the fileset?

I don't recall how we got there but we've used application/vnd+zarr in the past.

@jhamman, you use this for each .zgroup, .zarray and .zattrs file? Conceivably these could also have a prominent "json" in the mimetype.

joshmoore avatar Jan 17 '22 10:01 joshmoore

@jhamman, you use this for each .zgroup, .zarray and .zattrs file? Conceivably these could also have a prominent "json" in the mimetype.

So, we're using application/vnd+zarr as the asset media type in the STAC context where an asset is represented as a path that points to a directory that contains a .zgroup. We are not using the media types to represent the types of metadata or data objects within a zarr dataset.

jhamman avatar Jan 17 '22 21:01 jhamman

for which files are you thinking of adding a mimetype?

@joshmoore - same as @jhamman . in our archive we are using nesteddirectorystore hosted on s3 as an asset. only the top level path (e.g., /path/to/somename.ngff) in our database returns this mime-type within the metadata record, not the individual files underneath. we left our implementation for now with application/x-zarr with the possibility of converging on whatever consensus emerges.

satra avatar Jan 17 '22 22:01 satra

I just saw on a webinar from @bilts that NASA Harmony is using the mime type application/x-zarr for Zarr assets.

rabernat avatar Feb 01 '22 18:02 rabernat

Quote: A media type consists of a type and a subtype, which is further structured into a tree. A media type can optionally define a suffix and parameters:

  • type "/" [tree "."] subtype ["+" suffix]* [";" parameter]

Excerpts from a partial read of https://www.rfc-editor.org/rfc/rfc6838.html:

  • 3.4. Unregistered x. Tree . . . . . . . . . . . . . . . . . . . 7
    • "Subtype names with "x." as the first facet may be used for types intended exclusively for use in private, local environments." (As well as "discouraged" and "x-" is not considered part of this tree.)
  • 4.2.5. Application Media Types . . . . . . . . . . . . . . . 11
    • "The "application" top-level type is to be used for discrete data that do not fit under any of the other type names, and particularly for data to be processed by some type of application program."
  • 4.2.8. Structured Syntax Name Suffixes . . . . . . . . . . . 12
    • "Since this was published, the de facto practice has arisen for using this suffix convention for other well-known structuring syntaxes. In particular, media types have been registered with suffixes such as "+der", "+fastinfoset", and "+json". This specification formalizes this practice and sets up a registry for structured type name suffixes."

Based on these, my general thoughts are:

  • To avoid x[.-] if possible.
  • We might well be vnd. since "industry consortia as well as non-commercial entities that do not qualify as recognized standards-related organizations can quite appropriately register media types in the vendor tree." but I think we could go for one of the other trees.
  • application/zarr certainly seems to be a natural fit especially since it's unlikely that too much can be done with the entity without the proper application, but
  • I could also see getting behind use of +zarr so that the main intent of the entity could be expressed with another mimetype, image+zarr or application/zip+zarr. The document for that is Structured Syntax Suffixes. Another current example is +sqlite, which is defined to match application/vnd.sqlite3.
  • Finally there's going whole hog a la RFC 7303 - "XML Media Types"

joshmoore avatar Feb 04 '22 07:02 joshmoore

ping @yarikoptic

satra avatar Mar 06 '22 17:03 satra

Is there any precedent for using mime types to refer to directory trees as opposed to individual files?

jbms avatar Mar 06 '22 17:03 jbms

there have been several efforts : https://www.w3.org/2002/12/cal/rfc2425.html and various vendor specific things including directories on android: vnd.android.cursor.dir

but nothing looking at the type of directory based stores that we are considering here.

satra avatar Mar 06 '22 18:03 satra

Is there any precedent for using mime types to refer to directory trees as opposed to individual files?

FWIW I thought to check what http://github.com/file/file (libmagic) thinks -- looking at source and running (on linux) I think all directories are just inode/directory and I don't even see that one among iana.../...media-types.xhtml.

  • I could also see getting behind use of +zarr so that the main intent of the entity could be expressed with another mimetype, image+zarr or application/zip+zarr. The document for that is Structured Syntax Suffixes. Another current example is +sqlite, which is defined to match application/vnd.sqlite3.

I wonder if it shouldn't be the other way around, i.e. have /zarr and then possibly the +suffix (e.g., +zip assuming that +directory is like a default.) rfc6838 ref on suffixes

examples from media-types
$> curl --silent https://www.iana.org/assignments/media-types/media-types.xhtml | grep 'application.*+zip'
              <a href="application/bacnet-xdd+zip">application/bacnet-xdd+zip</a>
              <a href="application/epub+zip">application/epub+zip</a>
              <a href="application/lpf+zip">application/lpf+zip</a>
              <a href="application/p21+zip">application/p21+zip</a>
              <a href="application/prs.hpub+zip">application/prs.hpub+zip</a>
              <a href="application/vnd.comicbook+zip">application/vnd.comicbook+zip</a>
              <a href="application/vnd.d2l.coursepackage1p0+zip">application/vnd.d2l.coursepackage1p0+zip</a>
              <a href="application/vnd.espass-espass+zip">application/vnd.espass-espass+zip</a>
              <a href="application/vnd.etsi.asic-s+zip">application/vnd.etsi.asic-s+zip</a>
              <a href="application/vnd.etsi.asic-e+zip">application/vnd.etsi.asic-e+zip</a>
              <a href="application/vnd.exstream-empower+zip">application/vnd.exstream-empower+zip</a>
              <a href="application/vnd.familysearch.gedcom+zip">application/vnd.familysearch.gedcom+zip</a>
              <a href="application/vnd.ficlab.flb+zip">application/vnd.ficlab.flb+zip</a>
              <a href="application/vnd.gov.sk.e-form+zip">application/vnd.gov.sk.e-form+zip</a>
              <a href="application/vnd.imagemeter.folder+zip">application/vnd.imagemeter.folder+zip</a>
              <a href="application/vnd.imagemeter.image+zip">application/vnd.imagemeter.image+zip</a>
              <a href="application/vnd.iso11783-10+zip">application/vnd.iso11783-10+zip</a>
              <a href="application/vnd.logipipe.circuit+zip">application/vnd.logipipe.circuit+zip</a>
              <a href="application/vnd.maxar.archive.3tz+zip">application/vnd.maxar.archive.3tz+zip</a>

yarikoptic avatar Mar 07 '22 18:03 yarikoptic

I wonder if it shouldn't be the other way around, i.e. have /zarr and then possibly the +suffix (e.g., +zip assuming that +directory is like a default.) rfc6838 ref on suffixes

:+1: I could see that. Though I think the +zarr as with +sqlite3 or +zip could still be useful even if we want to target application/[vnd.]zarr for most cases. Though perhaps the fact that only one suffix is intended could come back to bite us.

joshmoore avatar Mar 07 '22 18:03 joshmoore

@satra It appears that both of those examples, https://www.w3.org/2002/12/cal/rfc2425.html proposing a text/directory mime type, and vnd.android.cursor.dir, logically represent some sort of collection of items, but are in fact still represented as a single file or byte stream.

Note: application/zip+zarr would correspond to a single file (the zip file) so there is no issue there.

I can see the benefit of using a mime type if you have an existing database where things are identified by mime types. But my understanding is that so far mime types have been limited to identifying the format of a single file / byte stream. We may want to be careful in using mime types outside of their normal scope --- and perhaps at least see if this is something that has been done before.

jbms avatar Mar 07 '22 19:03 jbms