json-schema-spec icon indicating copy to clipboard operation
json-schema-spec copied to clipboard

CBOR is a data format.

Open ioggstream opened this issue 2 years ago • 3 comments

ioggstream avatar Aug 14 '22 21:08 ioggstream

I'm still not sure I understand why this ought to be changed. We typically speak in terms of media types when we talk about mapping things into the JSON Schema data model, so the existing language is consistent with that.

I'll also bring over this conversation from the old PR, involving @awwright (1st and 3rd comments) and you (2nd comment):

As for using "media types" vs. "data formats", I don't care one way or the other, we don't actually need the well-defined term here. But CBOR is a media type, it's content-type registration is application/cbor.

my understanding from reading CBOR is that it is a data format. The registered media type for a single encoded CBOR data item is application/cbor.

I think it's fair to call CBOR either a data format or a media type; since media types I would describe as standardized data formats.

@ioggstream can you explain more about how this usage of "media type" is wrong, and what practical negative effects it might have?

handrews avatar Aug 14 '22 22:08 handrews

@ioggstream something i'm also trying to understand here is if there are other places where we are mis-using "media type", or if perhaps in places like this we should be saying "application/cbor" or "application/cbor and media types with a +cbor structured suffix" instead of just "CBOR".

I am not inherently against this change, I am just reluctant to agree to it without understanding the principle involved, and whether there is some well-known or documented usage convention here that we have been missing. Otherwise, sure, the first line of the CBOR RFC uses the words "data format" but I'm not aware that that has formal meaning?

handrews avatar Aug 14 '22 22:08 handrews

@handrews a media type is just an identifier (aka a label) for data format or content. The media type registration contains all the information associated to this identifier.

IIUC a media type is not necessarily a structured data format, but can just reference a specific data content (e.g. to identify the application used to process the specific file). For example, a fictitious "text/any" media type with a parameter named format that provides more information (e.g. format=md, format=rst, ...) can be used to instruct the computer to open that file with a generic text editor.

My understanding is that in this document, when speaking about media types, we often mean a data format.

The first line of the CBOR RFC uses the words "data format" but I'm not aware that that has formal meaning?

iirc RFCs use the terms "file format", "media format" and "data format" to describe both simple and complex formats. Since it's not my field of expertise, I'd ask @cabo though.

ioggstream avatar Aug 22 '22 08:08 ioggstream

@ioggstream Apparently I didn't notice your reply at the time, my apologies. That explanation makes sense to me.

handrews avatar Oct 17 '22 02:10 handrews

Neither did I. RFC 8949 uses "data format" in the common meaning, both as a term for generic data formats such as CBOR (and BSON in Appendix E) and for specific data formats defined by an application (Section 1.1, 5; Appendix E).

cabo avatar Oct 17 '22 05:10 cabo

My understanding is that in this document, when speaking about media types, we often mean a data format.

These are subtly different things in this case. The term "media type" is used throughout specifically because JSON Schema is an Internet specification and this emphasizes compatibility with the Internet ecosystem; and in this passage, the compatibility isn't limited to different data formats; it can potentially apply to other aspects of media types including URI references or fragments.

awwright avatar Oct 17 '22 20:10 awwright

this emphasizes compatibility with the Internet ecosystem [..] this emphasizes compatibility with the Internet ecosystem

IMHO JSON Schema is at least a "family of media types". Since the data model is the same, I think that ~data format~ "schema defiinition language" is a closer definition.

If we stick to the JSON Schema used in OAS3.0 we currently have at least 3 different media types used in the wild:

  • YAML serialization, which could be reasonably be an application/x-yaml-schema
  • JSON serialization .... an application/x-json-schema
  • YAML serialization via OAS application/x-oas-yaml-schema

ioggstream avatar Oct 17 '22 21:10 ioggstream

@awwright @ioggstream can either of you point to something in the current specification that wouldn't work or would be problematic for implementers if we used the other person's terminology?

@awwright what in the spec wouldn't make sense if we use "data format"?

@ioggstream aside from this CBOR line, is there anywhere else in the spec that is problematic with "media type" and would be improved with "data format"?

A third option is to just delete this line. It's come up before whether it's even a good idea to mention CBOR here.

handrews avatar Oct 17 '22 23:10 handrews

@handrews That won't be wrong, and if you want to make the change with your editor hat then OK, but in general I'd like to avoid second-guessing language that has no normative function, without a really good reason.

My thought is, again, there is a reason I wrote it like that—and at the end of the day, unless there's something significantly improved (a technical inaccuracy, typo, scrivener's error, etc), nuances of phrasing of non-normative passages should be left to the discretion of the editors.

The passage "including media types like CBOR" is deploying the "presumption of nonexclusive 'include'": It is defining a group comprising of "any document or structure", and this group includes (among other members), compatible Internet media types, and specifically CBOR in case there was any doubt. And the whole passage is non-normative anyways, it's pointing out something that should be self-evidently true for the convenience of the reader.

Of course I'll entertain ideas for improvements, especially typos, scrivener's errors, etc; but this passage is completely correct as it is.

awwright avatar Oct 18 '22 01:10 awwright

I originally thought I'd leave this one alone, but upon reading it, I think I agree with @ioggstream.

We're concerned with data formats that can map to the JSON model. From my understanding, a media type

  • declares a data format
  • instruct a containing application on how to handle that data

It's the first bit that JSON Schema cares about, not what the application does with it. As long as the data format is mappable to the JSON model, we should be happy with it.

Moreover, there are many media types that all use the same data format(s). It seems simpler for the spec to mention the supported data formats (by description, not by name) rather than the media types.

gregsdennis avatar Oct 18 '22 01:10 gregsdennis

This discussion is a bit confused about the media type(s) that you intend to define with this draft (the json-schema.org language, represented in JSON or various forms of YAML maybe), and the data format that is used to represent the data that the language describes (which may be JSON, or, with the appropriate extensions, some other formats). Note that the generic data model of the CBOR data format is a strict superset of the one interoperable JSON exhibits -- the same is true for YAML.
(Of course you could represent your language in CBOR as well, as you do with YAML, but that would be less useful.) A data format becomes a media type by describing it as such (and registering a media type name).

cabo avatar Oct 18 '22 04:10 cabo

A third option is to just delete this line. It's come up before whether it's even a good idea to mention CBOR here

For this specific issue, we can remove references to CBOR. This doesn't change the topic wrt JSON Schema is not (only) a media type...

aside from this CBOR line, is there anywhere else in the spec that is problematic with "media type" and would be improved with "data format"?

I see JSON Schema as a "schema definition language" or something like that. Not a data format (:bow: forgive me, I fixed my previous comment) @handrews I think that this second topic (what's JSchema) is key: let's discuss it in another issue though.

ioggstream avatar Oct 18 '22 07:10 ioggstream

This discussion is a bit confused about the media type(s) that you intend to define with this draft

The passage is only concerned with instances (the thing being validated), although note that all schemas are someone's instance, which is why you can write JSON Schema documents in YAML and nobody is confused by what that means (see below).

For this specific issue, we can remove references to CBOR.

No, the reference to CBOR is doing something useful. What would this solve?

This doesn't change the topic wrt JSON Schema is not (only) a media type...

JSON Schema is a media type first and foremost, but this is not in exclusion to other things.

The fact that JSON Schema is a media type is essential to much of its behavior. The $ref keyword uses fragment identifiers, which are defined as part of the schema media type. The $ref keyword also reads the media type of its target, which means validators can support YAML-encoded schemas without a syntax error or needing a special keyword. Both of these behaviors would have to be spelled out otherwise.

awwright avatar Oct 18 '22 19:10 awwright

Having now actually read a chunk of the CBOR spec (which I had not looked at in years), I'm no longer convinced we should even mention it. Its data model is substantially different, and while you can map JSON into it, mapping CBOR to JSON is lossy.

handrews avatar Jan 20 '23 02:01 handrews

To get the attention of the people who might care, a definition of how to use json-schema.org for CBOR could be done in a separate document. That doesn't change the fact that json-schema.org models will generally work for CBOR applications that stay within the JSON generic data model.

cabo avatar Jan 20 '23 09:01 cabo