json-schema-spec icon indicating copy to clipboard operation
json-schema-spec copied to clipboard

using/defining the "schema" link relation type

Open dret opened this issue 7 years ago • 28 comments

it seems that -07 recommends using schema-typed links (as one method) to link an instance to a schema. this link relation is not registered (yet). this means either there should be some spec doing this, or maybe the JSON schema draft should do it. if the latter and you need some help, let me know.

dret avatar Dec 19 '17 15:12 dret

@dret thanks! Yes, we were planning to register it as part of the JSON Schema draft. Or motivated by the JSON Schema draft, or whatever the appropriate mental model is. But I have no idea what approach is likely to be successful, and any advice would be appreciated.

Should we try to register it now, or is that something that must wait until we're at RFC or at least adopted by a working group?

handrews avatar Dec 19 '17 18:12 handrews

I almost put in an extension link relation like tag:json-schema.org,19-11-2017:schema or something, but that seemed more likely to confuse people, and then folks would have to change it again if/when we got the link relation registered.

handrews avatar Dec 19 '17 18:12 handrews

definitely never ever put identifiers in specs and then change them. that’s the X- disaster all over again...

On Dec 19, 2017, at 19:32, Henry Andrews [email protected] wrote:

I almost put in an extension link relation like tag:json-schema.org,19-11-2017:schema or something, but that seemed more likely to confuse people, and then folks would have to change it again if/when we got the link relation registered.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

dret avatar Dec 19 '17 18:12 dret

definitely never ever put identifiers in specs and then change them. that’s the X- disaster all over again...

Hah! Yeah, makes sense. We only changed from "profile" to "schema" due to the discussions here that "profile" really wasn't quite right. Given that "schema" is a pretty common concept applying to many media types, it seems like a good candidate for link relation / media type parameter.

I did just see https://github.com/dret/I-D/issues/92 about maybe using HTTP Prefer or some other proposal instead / in addition to media type parameters for "profile", so I'd be interested in considering similar options for "schema" if that makes sense.

I see "profile" and "schema" as similar, but while "profile" identifies a broadly applicable subset, "schema" identifies a set of values suitable for a much more specific purpose. It can also identify something broad, but the ability to be specific is what (I think) distinguishes it.

handrews avatar Dec 19 '17 19:12 handrews

On 2017-12-19 19:31, Henry Andrews wrote:

@dret https://github.com/dret thanks! Yes, we were planning to register it as part of the JSON Schema draft. Or motivated by the JSON Schema draft, or whatever the appropriate mental model is. But I have no idea what approach is likely to be successful, and any advice would be appreciated.

writing up a draft of its own may be a bit much. it would have the advantage of being more visible, and not creating the impression that "schema" is for JSON only.

but it's perfectly fine to leave it in there and make it part of the draft's IANA considerations. the only advice i have is to create a section for it that defines and describes it, and that is very careful at making it independent of JSON schema. that way, anybody interested in the definition could just read that section and wouldn't have to dive into the rest of the spec.

Should we try to register it now, or is that something that must wait until we're at RFC or at least adopted by a working group?

process-wise, if you want to register it now, a separate draft is the best way to go (it's not the only way, as there is no formal requirement that only RFCs can register link relation types). you can also leave it in there and then formally speaking, it would only become registered once JSON schema becomes an RFC.

while the IANA registry is only updated after RFC status, at http://webconcepts.info/concepts/link-relation/ i am compiling quite a number of link relation types that are defined in drafts. the reason is that these often are already in use, so keeping them "under wraps" probably is less useful than telling people that these are things that are under consideration/development. it is then up to them to decide whether they already want to get on board, or wait for the official registry to be updated.

dret avatar Dec 20 '17 06:12 dret

On 2017-12-19 20:33, Henry Andrews wrote:

Hah! Yeah, makes sense. We only changed from "profile" to "schema" due to the discussions here that "profile" really wasn't quite right. Given that "schema" is a pretty common concept applying to many media types, it seems like a good candidate for link relation / media type parameter.

it certainly has popped up quite often, in a variety of communities. i am not quite sure what the best word is. some communities might call this a vocabulary or an ontology, but it may be hard to find a word that works perfectly for everybody.

I did just see dret/I-D#92 https://github.com/dret/I-D/issues/92 about maybe using HTTP |Prefer| or some other proposal instead / in addition to media type parameters for "profile", so I'd be interested in considering similar options for "schema" if that makes sense.

i am not sure that it makes the same sense. a profile is an additional "layer" of constraints on top of a schema, so to speak. it is always safe to ignore it, because the baseline always is the schema (i.e., the media type).

for complex scenarios, you might need something more complex, such as the Profile header proposal (which to my mind uses the term "profile" in a rather different way from RFC 6906 anyway and to some extent simply may be actually a schema negotiation proposal).

I see "profile" and "schema" as similar, but while "profile" identifies a broadly applicable subset, "schema" identifies a set of values suitable for a much more specific purpose. It can also identify something broad, but the ability to be specific is what (I think) distinguishes it.

a schema sets the basics in how to represent concepts. if you cannot agree on a schema (which to me is nothing but the implementation of a domain model), then there is no mutual understanding.

a profile assumes that there is a baseline, and the profile is an optional layer of conventions. it should always be safe to go back to the baseline and still have the mutual understanding established by that baseline. (that's pretty much the atom example from the profile spec)

dret avatar Dec 20 '17 07:12 dret

a schema sets the basics in how to represent concepts

This seems like a good key distinction: the schema is about representation. You may have different valid and useful representations of an abstract type (which might clarify the distinction between "type", registered as describing the "abstract semantic type") vs "schema" (describing a particular way to represent that type in a data model, which may then be encoded in one or more media types).

So you could potentially negotiate both the schema (describing the data model for a representation) and media type (the concrete encoding of that data model into a JSON/YAML/XML/whatever document).

handrews avatar Dec 30 '17 19:12 handrews

On 2017-12-30 11:35, Henry Andrews wrote:

So you could potentially negotiate both the schema (describing the data model for a representation) and media type (the concrete encoding of that data model into a JSON/YAML/XML/whatever document).

at least for my terminology this is not how i think about schemas. schemas are defining models and their representation, let's say an XML model in terms of how it is represented in XML concepts. that tells you everything you need to know in terms of how this will look on the wire. same for JSON schemas.

for me, RDF is the outlier here because as an exception to the usual way how people use media types. RDF is an abstract model and you need to know a media type to determine how to serialize it. but the general idea that cultivating a zoo of model serializations is a good thing is more the exception than the rule.

one way or the other, i would be clear about how you use these terms, and if possible try to align them with "standard usage". but of course that's not a super well-defined concept to begin with...

dret avatar Dec 30 '17 19:12 dret

schemas are defining models and their representation, let's say an XML model in terms of how it is represented in XML concepts. that tells you everything you need to know in terms of how this will look on the wire. same for JSON schemas.

JSON Schemas work on a data model that is derived from but not specific to JSON. This is why I view the schema and media type as separate. JSON Schema keeps them separate, and the expectation is that people may use JSON, JSON5, YAML, TOML, CBOR, Protobuf, whatever with them. Some of those are a better fit than other, but the extensibility of JSON Schema allows for describing concepts more precise than can be directly expressed in the data model. You could probably make it work for XML to some degree, but I can't imagine why anyone would.

So the data model and media type are not entirely separate, but they are definitely not identical. Any definition of "schema" that requires them to be would be problematic for JSON Schema.

handrews avatar Dec 30 '17 20:12 handrews

On 2017-12-30 12:00, Henry Andrews wrote:

JSON Schemas work on a data model https://tools.ietf.org/html/draft-handrews-json-schema-00#section-4.2.1 that is derived from but not specific to JSON. This is why I view the schema and media type as separate. JSON Schema keeps them separate, and the expectation is that people may use JSON, JSON5, YAML, TOML, CBOR, Protobuf, whatever with them. Some of those are a better fit than other, but the extensibility of JSON Schema allows for describing concepts more precise than can be directly expressed in the data model. You could probably make it work for XML to some degree, but I can't imagine why anyone would.

interesting info, thanks. so it's more "JDM" (keeping in line with the XDM) than actually strictly "JSON schema". that makes sense as for example XSD also technically is an XDM schema language. it's just that nobody ever cared (much) because XML is the only relevant serialization out there.

it's just that in my mind, when you go this route, this recurring idea of being able to indicate a "model" to me terminologically makes more sense than the term "schema", but that's really just a question of how people typically use these terms in their own scenarios.

(and of course you could argue that "model" is too abstract and that you could have multiple schemas for the same model, which of course is true. it's complicated. ;-)

So the data model and media type are not entirely separate, but they are definitely not identical. Any definition of "schema" that requires them to be would be problematic for JSON Schema.

ok, fair enough. i would just make all of this explicit so that people understand the terms and how they are used in that context.

dret avatar Dec 30 '17 20:12 dret

Thanks, @dret this discussion is very helpful.

It's definitely occurred to me that "schema" is not necessarily the best name for JSON Schema, particularly given that people use it for so many things including data model definitions (code generation, doc generation, ui generation) to the extent that we are looking at adding vocabularies specifically for those purposes.

Then again, the project's been called JSON Schema for many years, and changing the name would likely kill any momentum of the larger ecosystem, so... ¯\(ツ)

handrews avatar Dec 30 '17 20:12 handrews

I was only casually paying attention, but I wasn't actually aware the specification changed the link relation we were using. No existing link relation was sufficient?

  • "profile"
  • "describedBy"
  • "type"

For example

Link: <http://example.org/Person.schema.json>; rel="type"; type="application/schema+json"

https://tools.ietf.org/html/rfc5988 (HTTP Link header, including description of the "type" attribute) https://tools.ietf.org/html/rfc6906 (rel="profile") https://tools.ietf.org/html/rfc6903 (various relation types including rel="type")

I know @dret suggested "profile" was incorrect for us but my reading of RFC6906 suggests we were using it exactly correctly. Perhaps I wasn't describing what we're intending to do very clearly?

awwright avatar Jan 01 '18 05:01 awwright

@awwright duuuuuuuuude....

  • Here is the PR on which I requested a review from you, tagged you in the issue comments, and held it open for an extra 2.5 weeks beyond the usual 2 week period specifically trying to get a review from you. By name.
  • Here is the issue in which you and I discussed the "profile" relationship extensively with @dret. You even assigned the issue to yourself when I posted that PR, and then didn't comment on it for over a month.

At some point, I can't keep waiting. I know I pinged you on email and slack because I know you are interested in this area. This is the 2nd time in the past month that someone has wanted to revisit something that I begged and pleaded for comments on. I don't know what to do when people won't reply in any meaningful way for over a month. I try direct email, @-mentions, mailing list, IRC, slack, anything and everything I can think of. I announced the final review period everywhere, often repeatedly, and it was open for a solid month. I did everything I could think of. In all seriousness, what am I missing? What do people need in order to make a timely review before publication? Or at least skim the change log, which lists this change?

Regarding "profile", if you want to keep arguing with @dret about it go ahead, but I found his reasoning quite clear and see no reason to revisit it. I don't know how you read him clearly stating that schemas are not profiles and come to a different conclusion.

You also cited HTML's "profile" , which does seem much closer to what we would want, but the response to that is that just because HTML uses the same word, that does not mean that it is using it in the RFC 6906 sense, and you never replied to that concern (or made any further replies on the issue at all for the remaining ten months that it was open).

There is also the Accept-Profile proposal which is specifically using a different definition of profile than RFC 6906. So I suppose we could define our own "profile" definition, but since we're talking about a media type parameter and link relation, unlike with HTTP headers, that term is already in use. We can't just re-define it to agree with some other definition somewhere else that suits us better.

Regarding "type", as far as I can tell we both agreed that it was not quite right. It is specified to identify the "abstract semantic type", which is a more generic concept. If I identify something's "type" as "car", there may be numerous schemas that describe different ways of representing a car. In particular, as the representation evolves, new schema versions will be published and used, but the "abstract semantic type" remains the same.

So I see "type" as serving a clear purpose of abstracting away concrete representation details, while the relation we need is specifically about concrete representation.

As for "describedBy", it's still in the spec and still doing exactly what it was doing before. Although I'm not actually sure we should continue using it. AFAICT it is more often used for things like human-readable documentation, which would be useful alongside of schema links. We were also using it because "profile" is specifically an identifier and not a locator, so "describedBy" was to be the locator. But we could define things differently with our own media type parameter / link relation if we want. I'm definitely not dead-set on getting rid of it, but once we settle on the behavior we want from our own relation type we should make sure our usage still makes sense.

handrews avatar Jan 01 '18 19:01 handrews

@handrews Actually I do remember a lot of that now that you mention it.

Have we pinged @RubenVerborgh to get his take on how/if his usage is similar to ours & RFC 6906?

Edit: Yeah you did that too, nice.

Edit: My reading of rel="type" is that it's extremely generic:

The "type" link relation references the payload's abstract semantic type
[...]
If the context can be considered to be an instance of multiple semantic types, multiple "type" link relations can be used.

So while something more specific would be preferable, but I still think could be relevant

awwright avatar Jan 02 '18 01:01 awwright

Jup, pinging happened here: https://github.com/ProfileNegotiation/I-D-Accept--Schema/issues/13#issuecomment-354633650

RubenVerborgh avatar Jan 02 '18 11:01 RubenVerborgh

@awwright I think that "type" is relevant, just not sufficient. I would actually use it alongside of a more specific link relation, such that in an API that evolves (or even does coarse-grained versioning in the base URI) the resources would have a stable "type" but different schemas.

So I could use the same "type" values across the current kinda-REST-ish version of the API I'm working with, and also with a replacement fully RESTful API, even though the schemas would be very different. I'm not sure how that would be of use to clients off the top of my head, but it seems like a worthwhile distinction.

I think we need to figure out what level of specificity we want. I have been going for something more specific, particularly to enable content negotiation. But I think that Accept-Profile is the most promising avenue for schema-based content negotiation, which makes me a bit less concerned. Still, we need a media type that will work for us in non-HTTP environments (e.g. will CoAP adopt Accept-Profile as well?)

handrews avatar Jan 02 '18 17:01 handrews

On 2018-01-02 09:29, Henry Andrews wrote:

I think we need to figure out what level of specificity we want. I have been going for something more specific, particularly to enable content negotiation. But I think that |Accept-Profile| is the most promising avenue for schema-based content negotiation, which makes me a bit less concerned. Still, we need a media type that will work for us in non-HTTP environments (e.g. will CoAP adopt |Accept-Profile| as well?)

fwiw, i have started working on an update of RFC 6906, the idea being to improve the wording, and to add an HTTP preference for profiles. that would not be as complex as the Accept-Profile mechanism, but should be better than having to depend on media type parameters.

any feedback on that feature as well on what other improvements of RFC 6906 are possible are very welcome (ideally in the form of issues).

https://github.com/dret/I-D/tree/master/rfc6906bis

dret avatar Jan 02 '18 17:01 dret

@dret Will this updated version of the RFC explain how to achieve such a preference with existing means, or will it introduce a new header (as we're planning for https://github.com/profilenegotiation/I-D-Accept--Schema/)?

RubenVerborgh avatar Jan 03 '18 13:01 RubenVerborgh

On 2018-01-03 05:06, Ruben Verborgh wrote:

@dret https://github.com/dret Will this updated version of the RFC explain how to achieve such a preference with existing means, or will it introduce a new header (as we're planning for https://github.com/profilenegotiation/I-D-Accept--Schema/)?

it will not introduce a new header, but a new registered value for an existing header: the plan is to register a "profile" HTTP preference, which then allows HTTP-level profile negotiation.

dret avatar Jan 05 '18 16:01 dret

@dret but to clarify, your "profile" preference would use your definition of "profile", which excludes schemas, so could not be used for JSON Schema, correct? So our options are:

  • Register a "schema" preference
  • Jump on the "Accept-Profile" bandwagon since it uses a broader definition of "profile" that includes schemas
  • Come up with something else (like our own media type with its own parameters such as a "schema" parameter)

Is this correct?

handrews avatar Jan 05 '18 17:01 handrews

On 2018-01-05 09:27, Henry Andrews wrote:

@dret https://github.com/dret but to clarify, your "profile" preference would use your definition of "profile", which excludes schemas, so could not be used for JSON Schema, correct?

that's just my conceptualization and i know that others see things differently. for me:

  • a schema is based on some metamodel and has a way to define a model. thus a schema introduces a new abstraction layer.

  • a profile is a specialized way to use a model based on additional constraints. it adds to the model (by adding constraints), but doesn't add a new abstraction layer.

So our options are:

  • Register a "schema" preference
  • Jump on the "Accept-Profile" bandwagon since it uses a broader definition of "profile" that includes schemas
  • Come up with something else (like our own media type with its own parameters such as a "schema" parameter) Is this correct?

i don't want to be the one telling you what your options are. it seems that people are trying to do somewhat related things in this area, but then again it often seems like they're far apart enough to make it hard for people to join forces. in particular, "schema" seems to be the walking dead among the link relations.

i'd be more than happy to see if a revised RFC 6906 may be more useful for you. but if i recall correctly, you very specifically wanted to use profiles for the use cases of schemas that i wanted to separate them from. that may make it hard to find changes that fit your needs.

FYI: https://github.com/ProfileNegotiation/I-D-Accept--Schema

dret avatar Jan 05 '18 23:01 dret

@handrews

I think we need to figure out what level of specificity we want. I have been going for something more specific, particularly to enable content negotiation. But I think that Accept-Profile is the most promising avenue for schema-based content negotiation, which makes me a bit less concerned.

I don't quite get the "schema-based content negotiation" part. Can you clarify? In my understanding, an instance can only have one schema. This contrasts with profiles (in the sense of "application profiles" a la Dublin Core); an instance may be represented through several profiles.

dlax avatar Jan 06 '18 22:01 dlax

@dlax certainly!

What I'm thinking of is a REST API that evolves representations at the per-resource granularity, rather than doing coarse grained URI-based versioning.

In this view, resources (the abstract things on the server) are not versioned. If I have Person resources, the resource is always Person. Person is a concept, it does not change.

However, the representation of a Person in a given API will likely changes. Fields are added, enum sets are changed, fields may even be removed in a compatible way if they were not required.

The versioning of the representation is expressed by assigning a new schema to each successive version (and the URIs for the schemas probably do have some sort of version in the URI- semantic versioning, a date-time stamp, whatever).

So let's think particularly of the case where the representations are all compatible with each other, and the schemas are designed to support that, meaning:

  • Schemas never set "additionalProperties": false so that fields can be added without causing validation to fail
  • Schemas avoid required as much as possible, so that fields can be removed without causing validation to fail

There are other constraints you can use on schema design, but these illustrate the point.

So:

{
    "$id": "https://example.com/schemas/some-entity/1.0.0",
    "type": "object",
    "properties": {
        "foo": {"type": "integer"},
        "bar": {"type": "string"}
    }
}
{
    "$id": "https://example.com/schemas/some-entity/1.1.0",
    "type": "object",
    "properties": {
        "foo": {"type": "integer"},
        "stuff": {"type": "boolean"}
    }
}

The instance:

{
    "foo": 42,
    "bar": "hello",
    "stuff": false,
    "nonsense": null
}

validates against either the 1.0.0 or 1.1.0 schemas. If you're doing something with bar you need 1.0.0. If you're doing something with stuff you need 1.1.0. If you're doing something with foo either will work. And there's no guidance in either version on what nonsense should look like or how to use it.

So if I'm an API client using the "some-entity" resource, I may have started out while it was on version 1.0.0. I would use content negotiation (media type parameter, HTTP Prefer header preference, or a new Accept-Profile or Accept-Schema header) to ask for a representation matching schema version 1.0.0.

The response can come back with the instance above and link to both the 1.0.0 and 1.1.0 schemas, because it is valid according to both and therefore usable as either.

I can use this information to log that the resource has been updated, and then a human can decide whether to start asking for 1.1.0 (because stuff is needed) or stay with 1.0.0 (because bar is still needed).

I can go into more detail but I'll pause here to see if this is making sense.

handrews avatar Jan 06 '18 23:01 handrews

On Jan 6, 2018, at 13:31, Henry Andrews [email protected] wrote:

What I'm thinking of is a REST API that evolves representations at the per-resource granularity, rather than doing coarse grained URI-based versioning.

....

I can use this information to log that the resource has been updated, and then a human can decide whether to start asking for 1.1.0 (because stuff is needed) or stay with 1.0.0 (because bar is still needed).

I can go into more detail but I'll pause here to see if this is making sense.

all if this makes sense and as you can tell, our general design thinking is very close: http://dret.typepad.com/dretblog/2016/04/robust-extensibility.html

what i am wondering is why you then wouldn’t simply call it what it is (semantic versioning, as bad as that name is), represent that in your instance, and be done. why the need to piggyback all the versioning ideas you have onto schemas and their naming conventions? it seems like you’re mixing things a little.

dret avatar Jan 07 '18 01:01 dret

On Jan 6, 2018, at 12:52, Denis Laxalde [email protected] wrote: I don't quite get the "schema-based content negotiation" part. Can you clarify? In my understanding, an instance can only have one schema. This contrasts with profiles (in the sense of "application profiles" a la Dublin Core); an instance may be represented through several profiles.

ever heard of DSDL? there almost by definition each instance has multiple schemas. it’s actually a very clever approach: modularize validation like any other non-trivial task, and allow schemas to be composed of multiple languages with each language having a specific focus.

dret avatar Jan 07 '18 01:01 dret

@handrews

I can go into more detail but I'll pause here to see if this is making sense.

This makes sense, thanks!

@dret

ever heard of DSDL? there almost by definition each instance has multiple schemas. it’s actually a very clever approach: modularize validation like any other non-trivial task, and allow schemas to be composed of multiple languages with each language having a specific focus.

Actually, I was more thinking about "schema as a data model" rather than "schema as a validation tool". @handrews explained his intended usage of content negotiation for the former "definition" and I now see how a client may ask for a resource that follows a given data model. So about multiple validation and DSDL, yes, a resource may have several complementary schemas, but it's not clear to me how negotiation would come into play as far as validation is concerned. Would a client ask for a resource that validates with a particular technology? Would that resource's representation be different if another technology had been asked? Maybe it's just irrelevant and only the data model (or profile) point of view matters for negotiation.

dlax avatar Jan 07 '18 10:01 dlax

@dret https://github.com/dret ever heard of DSDL? there almost by definition each instance has multiple schemas. it’s actually a very clever approach: modularize validation like any other non-trivial task, and allow schemas to be composed of multiple languages with each language having a specific focus.

Actually, I was more thinking about "schema as a data model" rather than "schema as a validation tool".

that's what it looked like to me. but i think then you're more moving into the direction of the "type" link, which opens up a different can of worms (and probably should be treated as something different, as a schema typically is an implementation of a type).

even something as widely known as HTML has different schemas, or at least it used to have when it was more declaratively defined. there was a "strict" schema intended to be used for production, and a "loose" schema intended to be used for consumption. regardless of the details, it just demonstrates that "schema" and "type" are different things.

@handrews https://github.com/handrews explained his intended usage of content negotiation for the former "definition" and I now see how a client may ask for a resource that follows a given data model.

that again sounds very much like "type".

So about multiple validation and DSDL, yes, a resource may have several complementary schemas, but it's not clear to me how negotiation would come into play as far as validation is concerned. Would a client ask for a resource that validates with a particular technology? Would that resource's representation be different if another technology had been asked? Maybe it's just irrelevant and only the data model (or profile) point of view matters for negotiation.

well, DSDL probably is not a things anymore. but yes, it used to be the case that you would validate for the concern you had at the moment. if you wanted to see if character ranges were respected, you would validate with that particular schema. to figure out how namespaces were being used you used the language specialized in that, and so forth. it was actually quite an elegant design.

but really, let's not talk about DSDL. i just wanted to point out that there can be multiple schemas, and that what you seem to be talking about is more concerned with just versioning and possibly "type".

dret avatar Jan 07 '18 19:01 dret

https://json-schema.org/latest/json-schema-core.html#rfc.section.11.1 proposes: Link: <https://example.com/my-hyper-schema#>; rel="describedby"

But then the final paragraphs of https://json-schema.org/latest/json-schema-core.html#rfc.section.11.2 has: Link: </alice>;rel="schema", </bob>;rel="schema"

I don't understand why these use different link relations?

Sorry if I should have raised this as a new issue, @handrews!

garethsb avatar Nov 06 '19 08:11 garethsb