json-schema-vocabularies Multi lingual annotations

Historically OpenAPI and JSON Schema have worked together to document existing APIs. In a contract first environment they are being asked to do more than merely document existing APIs; they are defining APIs that are yet to be built. JSON Schema is no longer just about payload validation, the annotations in a JSON Schema complete the definition of the payloads sufficient to define APIs that are yet to be built.

When defining enumeration values, JSON Schema lets us give each enumeration value annotations:

  type: object
  properties:
    Name:
      type: string
    Gender:
       type: string
       oneOf: 
         - const: "M"
           title: "Male"
         - const: "F"
           title: "Female"
         - const: "D"
           title: "Diverse"
           description: "Used where gender known, and is non-binary"

When JSON Schema is used to define APIs and message payloads, it is not unreasonable for software developers to choose to populate user interface elements (radio buttons, drop down lists, etc.) from the annotation values presented in the JSON schema. In a multi-lingual environment, we may need multiple annotations, keyed by language.

Expanding the definition of the title (and perhaps description) annotations to be either a single string or an object with properties keyed on language tags would be a good idea:

  type: object
  properties:
    Name:
      type: string
    Gender:
       type: string
       oneOf: 
         - const: "M"
           title: {"en-NZ": "Male", "mi-NZ": "Tāne" }
         - const: "F"
           title: {"en-NZ": "Female", "mi-NZ": "Wahine" }
         - const: "D"
           title: {"en-NZ": "Diverse", "mi-NZ": "Ira tāngata kōwhiri kore" }
           description: "Used where gender known, and is non-binary"

May 28 '19 04:05 stueynz

@stueynz Is this in response to json-schema-org/json-schema-spec#743 ?

May 28 '19 09:05 awwright

Yes...I was in two minds if it should go on #743 or be a an Issue / Feature request.

May 28 '19 20:05 stueynz

A less disruptive option might be to follow RFC 8288's approach of title*

Keeping individual keywords simple and targeted is a good thing in my view, and there are different mindsets involved in monolingual and multilingual environments.

May 28 '19 22:05 handrews

The example you gave when an interface is generated from a JSON schema is quite application specific. If it wants to include multilingual support in a schema I think the proper solution would be for it to define it's own vocabulary or work around it in other ways. Putting stuff like this in core only bloats the spec and puts unnecessary burdens on implementations.

May 29 '19 09:05 johandorland

My feeling is the same as @johandorland here. Core should be kept simple, but new extended vocabularies can be written. This will be easier when draft-8 is out, and after such I'm sure myself and others will be happy to help start making a documentation or i11n annotations vocabulary for documentation and UI generation.

May 29 '19 09:05 Relequestual

JSON Schema is no longer just about payload validation

This is exactly why @handrews put so much effort into adding "Vocabularies"! =]

May 29 '19 09:05 Relequestual

A less disruptive option might be to follow RFC 8288's approach of title*

Two things to note about title* from rfc-8288:

The title* link-param MUST NOT appear more than once in a given link-value; - The whole point of what I'm getting at is that we want to be able to put multiple title entries against an enum value, so we'd be departing from rfc-8288 by permitting multiple title* entries;
rfc-8288 is actually piggybacking off rfc-8187 which is trying to encode both language and non US ASCII characters into HTTP headers, which are restricted to only allowing US ASCII characters.

We don't have the character encoding problem; JSON Schemas are already written in JSON, which includes the implied use of utf-8 encoding;

We could leave the existing title tag as a single string, and make title* be the object with multiple values keyed by language tags:

 Gender:
       type: string
       oneOf: 
         - const: "M"
           title*: {"en-NZ": "Male", "mi-NZ": "Tāne" }
         - const: "F"
           title*: {"en-NZ": "Female", "mi-NZ": "Wahine" }
         - const: "D"
           title*: {"en-NZ": "Diverse", "mi-NZ": "Ira tāngata kōwhiri kore" }
           description: "Used where gender known, and is non-binary"

Jun 05 '19 03:06 stueynz

Fully aggree with @johandorland and @Relequestual

@stueynz To the best of my understanding of REST principles an OpenAPI-Document should first and foremost be treated like a REST resource itself. There is nothing special making it in any way different from the resources it describes. So if some client asks for an OpenAPI document with Accept-Language: en-NZ a server may serve an OpenAPI document fully localized for en-NZ and if a client asks for en-US it may get the document for en-US. If the language header is missing a server may respond with a default locale (setting the Content-Language header). Links to other languages may be represented using Hypermedia.

Just imagine a situation where some of your enum values were permitted for one locale but not applicable to another due to regulatory differences or for whatever reason. There might be any kind of differences in validation constraints based on a particular locale. After all your OpenAPI descriptions should be in the language of your API customer as well.

I'm afraid trying to nail all those cases with special schema syntax is going to become a half-baked fail because it heavily goes against REST principles IMHO.

Sep 01 '19 19:09 about-code

I also agree with @johandorland and @Relequestual that the correct place do what I'm wanting is in a special Internationalisation vocabulary. Now that draft-8 is out the door, we can perhaps think about it.

To @about-code I agree that using Accept-Language header is how one would query for and receive a fully localised, mono-lingual edition of the JSON schema.

However:

Human language is a complex tapestry; the proposal is to add a vocabulary allowing for the inclusion of multi-lingual annotations in the schema language; because there is a need.
JSON Schema & OpenAPI are no longer just for the API developers; our open API spec is the contract to define requirements for systems yet to be built; we need a mechanism to allow us to specify multiple translations of the field enumerations and descriptions. Of course we could just slap a few vendor extensions like x-title and x-description in as appropriate.

Oct 16 '19 20:10 stueynz

@brettz9 suggested the following in https://github.com/json-schema-org/json-schema-spec/issues/114:

(Filing as a separate issue as per #53 (comment) )

As with HTML's lang/xml:lang properties, it would be useful to indicate the content language of a particular field in a standard manner. This might be used for proper font display (as in CJK languages) or for selectively showing content to users based on their locale.

I think a name "contentLang" for the property would avoid confusion at (falsely) thinking that this was necessarily indicating the language of the "title" or "description" itself.

Although it is probably not a feature that would be used for validation (since looking at code points might not be reliable or valid such such detection), contentLang does describe the data, placing constraints on how it is to be understood which brings it more into the world of schemas (unlike i18n of the schema itself).

Note that JSON-LD would not solve this unless one uses JSON-LD in the instance documents (which would prevent the benefit of having such a pseudo-constraint at the schema level).

Let's see if we can solve i18n as a vocab over here.

Jan 15 '20 12:01 philsturgeon

I'd also like to offer a related suggestion if it might fit here---format set to a standard language code (BCP 47).

Apr 04 '21 09:04 brettz9

@brettz9 nothing should ever be added to format again. Start a new set of keywords, don't further burden the most broken one that we haven't been able to nuke yet.

Apr 04 '21 19:04 handrews

Sorry, have some "brain fog" problems, so can make some dumb statements while iterating on my work.

So, it is understandable for not creating new official format values.

I have some further questions I'll post below, knowing this is not the right place for further discussion, but if I could kindly request your assistance in pointing me to an existing issue where such further discussion would be suitable.

But as far as starting a new set of keywords, for those who need to tack on something quickly for their project's needs and don't have the time to work on specifications (or to wait on them to be approved), I think people need a space to add multiple keywords without fear of namespace conflict.

Per your comment at https://github.com/json-schema-org/json-schema-vocabularies/issues/33#issuecomment-539267726 , "The bar for a new core keyword ($-prefix) is very high", suggesting that it is still possible that new core $ may be used in the future, so it seems unsafe to use $ alone as a namespace protection.

I found this comment on adding a JSON Schema equivalent to a meta tag interesting, if this might be a kind of open-ended mechanism for adding custom keywords (or whatever may do so).

I really would hope that this or something like it could be prioritized so that one can begin to have multiple formats, and maybe to assist deprecating format as I don't believe the spec makes clear this is expected to happen.

This might even help with vocabulary development if experiments could be turned into standards. The meta approach I think has another aspect worthy of emulation too: having a registry open to anyone, like an equivalent to https://wiki.whatwg.org/wiki/MetaExtensions which could avoid namespace conflicts even for non-community adopted standards, and which allows for semi-formal as well as formal specifications.

Apr 04 '21 23:04 brettz9

Hi! Are there any new developments regarding internationalization of JSON Schema Validation, or minimal / best-practice workaround?

Jan 16 '24 13:01 ThorFjelldalen

@ThorFjelldalen in terms of official localization support, no.

However, with 2020-12, you will get any undefined keywords as annotations. This means that you can include title* keywords for each language that you want to support. Since title results in an annotation anyway, this change in processing the results shouldn't be too difficult.

Your schema from above:

Gender:
  type: string
  oneOf: 
    - const: "M"
      title*: {"en-NZ": "Male", "mi-NZ": "Tāne" }
    - const: "F"
      title*: {"en-NZ": "Female", "mi-NZ": "Wahine" }
    - const: "D"
      title*: {"en-NZ": "Diverse", "mi-NZ": "Ira tāngata kōwhiri kore" }
      description: "Used where gender known, and is non-binary"

could become

Gender:
  type: string
  oneOf: 
    - const: "M"
      title: "Male"
      title@mi-NZ: Tāne
    - const: "F"
      title: "Female"
      title@mi-NZ: Wahine
    - const: "D"
      title: "Diverse"
      title@mi-NZ: Ira tāngata kōwhiri kore
      description: "Used where gender known, and is non-binary"

Here, I'm using title@<locale> for the keys. When evaluating an instance, e.g. {"Gender": "F"}, you'll get the result (in Verbose format):

{
  "valid": true,
  "keywordLocation": "",
  "absoluteKeywordLocation": "https://json-everything.net/438509aca5",
  "instanceLocation": "",
  "annotations": [
    {
      "valid": false,
      "keywordLocation": "/oneOf/0",
      "absoluteKeywordLocation": "https://json-everything.net/438509aca5#/oneOf/0",
      "instanceLocation": "",
      "errors": [
        {
          "valid": false,
          "keywordLocation": "/oneOf/0/const",
          "absoluteKeywordLocation": "https://json-everything.net/438509aca5#/oneOf/0/const",
          "instanceLocation": "",
          "error": "Expected \"\\\"M\\\"\""
        }
      ]
    },
    {
      "valid": true,
      "keywordLocation": "/oneOf/1",
      "absoluteKeywordLocation": "https://json-everything.net/438509aca5#/oneOf/1",
      "instanceLocation": "",
      "annotations": [
        {
          "valid": true,
          "keywordLocation": "/oneOf/1/title",
          "absoluteKeywordLocation": "https://json-everything.net/438509aca5#/oneOf/1/title",
          "instanceLocation": "",
          "annotation": "Female"
        },
        {
          "valid": true,
          "keywordLocation": "/oneOf/1/title@mi-NZ",
          "absoluteKeywordLocation": "https://json-everything.net/438509aca5#/oneOf/1/title@mi-NZ",
          "instanceLocation": "",
          "annotation": "Wahine"
        }
      ]
    },
    {
      "valid": false,
      "keywordLocation": "/oneOf/2",
      "absoluteKeywordLocation": "https://json-everything.net/438509aca5#/oneOf/2",
      "instanceLocation": "",
      "errors": [
        {
          "valid": false,
          "keywordLocation": "/oneOf/2/const",
          "absoluteKeywordLocation": "https://json-everything.net/438509aca5#/oneOf/2/const",
          "instanceLocation": "",
          "error": "Expected \"\\\"D\\\"\""
        }
      ]
    }
  ]
}

You can see there's an entry with "keywordLocation": "/oneOf/1/title@mi-NZ" that has the annotation value you're looking for.

The new proposed format is a bit easier to read, but I don't think anyone but me supports it yet since it's not released.

{
  "valid": true,
  "evaluationPath": "",
  "schemaLocation": "https://json-everything.net/79e565d592#",
  "instanceLocation": "",
  "details": [
    {
      "valid": false,
      "evaluationPath": "/oneOf/0",
      "schemaLocation": "https://json-everything.net/79e565d592#/oneOf/0",
      "instanceLocation": "",
      "errors": {
        "const": "Expected \"\\\"M\\\"\""
      }
    },
    {
      "valid": true,
      "evaluationPath": "/oneOf/1",
      "schemaLocation": "https://json-everything.net/79e565d592#/oneOf/1",
      "instanceLocation": "",
      "annotations": {
        "title": "Female",
        "title@mi-NZ": "Wahine"
      }
    },
    {
      "valid": false,
      "evaluationPath": "/oneOf/2",
      "schemaLocation": "https://json-everything.net/79e565d592#/oneOf/2",
      "instanceLocation": "",
      "errors": {
        "const": "Expected \"\\\"D\\\"\""
      }
    }
  ]
}

Jan 16 '24 19:01 gregsdennis

Wonderful @gregsdennis! Thanks for the in-depth answer! 🙌

Jan 16 '24 22:01 ThorFjelldalen