Best practices for multilingual values
As evidenced by reports, there is some confusion about how to use multilingual data values alongside language maps. @pchampin noted that using an alias is a good way to work through this, and @BigBlueHat noted (link to minutes forthcoming) that this is the approach taken by Web Annotations. We should offer some examples of this practice, probably in the context of the (long-promised) Primer.
Riffing off of @pchampin's example in https://github.com/w3c/json-ld-syntax/issues/91#issuecomment-445313785, we might use data indexing to aid access:
{
"@context": {
"occupation": { "@id": "ex:occupation", "@type": "rdf:HTML", "@container": "@data" },
"description": "ex:description"
},
"name": "Yagyū Muneyoshi",
"occupation": {
"ja": "<span lang=\"en\">Ninja in japanese: <span lang=\"jp\">忍者</span>",
"en": "<span lang=\"en\">Ninja in english: <span lang=\"en\">Ninja</span>",
"cs": "<span lang=\"en\">Ninja in czech: <span lang=\"cs\"> Nindža </span>"
}
}
This allows data indexing and consistent use of HTML values.
But... what would the generated RDF look like? One cannot add a language tag to a typed literal:-(
Its not a language tag, it’s a data index which has no RDF representation.
It’s useful for creating structural indexes.
Oops... well, this is one of those surprise effect that @ajs6f was talking about yesterday: I missed the "@container": "@data" and thought it was language. Ie, if I am an author not looking into the details of the context, I can be a bit misled.
Yes, it is legal; I do not think it is good practice.
seeAlso https://iiif.io/api/presentation/3.0/#44-html-markup-in-property-values
This issue was discussed in a meeting.
RESOLVED: highlight the need for work is ongoing, but it should present what can be done today via language/data maps and/or using HTML (or other) micro-syntax for expressing multiple language
View the transcript
Multilingual ValuesBenjamin Young: https://github.com/w3c/json-ld-syntax/issues/105
Benjamin Young: Another easy one ;)
… this one is about how JSON-LD currently works, and our past decisions to use HTML for multi lingual values (strings with multiple languages)
… so use straight up HTML, which is not ideal
… Looking at text level semantics HTML, but that’s for the future.
… so what do we need to propose in the primer to close the issue?
… related - there’s no way to do multi-language language maps
Rob Sanderson: it seems we should split this into a primer issue
… eg how do you use language tags
… and what do you do with multiple languages
… and then have a syntax issue around gkellogg’s issue for the normative specs
Benjamin Young: …about, is it an error to have English and Japanese in a string that is stated to be only one of those
Ivan Herman: What was put there by gregg sounds like a solution, but a bit misleading. The use of language tags gives the wrong impression — should be just indexes
Ivan Herman: Language tags are defined by ISO
Rob Sanderson: “<span lang="en">Ninja in japanese: 忍者“@ja
Rob Sanderson: I agree ivan. to your question, the RDF would look like that:
Rob Sanderson: "Ninja in japanese: 忍者"@ja^^rdf:langString
Rob Sanderson: this has been my issue for 5+ years
… language tags must be langString
Ivan Herman: an RDF issue that is not ours to solve
… Lots of nice discussions in dbooth’s repo, but it should happen in RDF not here
… same as missing base direction
… we can only set a single language. And this is the same as base direction, shouldn’t touch it
Rob Sanderson: +1 to ivan
Benjamin Young: RDF is woefully broken in this way, but Gregg’s proposal of HTML + language map would be desirable by JSON developers
Rob Sanderson: https://iiif.io/api/presentation/3.0/#44-html-markup-in-property-values
Benjamin Young: If built to contain HTML, they’re not going to take it into RDF, so a little misuse has advantages
Ivan Herman: q=
Benjamin Young: our audience is interested in JSON, with a side plate of a graph
Rob Sanderson: I put this link in earlier https://iiif.io/api/presentation/3.0/#44-html-markup-in-property-values
… it uses exactly what gkellogg describes
… it is common and exactly what people want to be able to do
Ivan Herman: The funny thing is what you wrote is legal but ugly RDF – a microsyntax for a string, which is outside of RDF or JSON-LD
… it happens to be a subsyntax of HTML
… don’t need anything in the syntax document to do this, its a private agreement between parties
Rob Sanderson: +1 to Ivan
Ivan Herman: this is probably the only thing we can do
… so no issue in the syntax document
… it’s an ugly but best practice given the current technologies
Pierre-Antoine Champin: Going to propose a crazy idea, in the line of what Ivan said. We don’t need to change RDF, we could define a custom datatype. langString is syntactic sugar for a standard datatype for a more ugly microsyntax of the language inside the value
… we could define a more complex but similar datatype. That’s the crazy idea :) We could instrument it in RDF, with another container type, so that what gregg proposed would generate the appropriate structure
… but it’s quite some work
Ivan Herman: technically … yes … and now I put on the W3C hat, it’s outside of our charter. This would be a RDF datatype.
Pierre-Antoine Champin: What about JSON data type?
Ivan Herman: JSON is closer to our charter. But language isn’t.
… it would be a lot of work … the flood gates would be open. Ruby, direction, etc.
Benjamin Young: https://w3c.github.io/string-meta/
Benjamin Young: worth pausing on the JSON data type. I hear the concerns … is there a way around them? This string-meta document from i18n suggests JSON-LD as a solution for multi-language use
… feel that there’s an opportunity here
… And if we miss it, there’ll be a lot of terrible looking JSON-LD
… I see that it evokes process specters, but it comes up a lot
… The genie won’t go back into the bottle. So any hope of this?
Ivan Herman: Don’t remember the issue, but got into a long discussion with the editors. The examples are mostly wrong.
Benjamin Young: https://github.com/w3c/string-meta/issues/27
Benjamin Young: also https://github.com/w3c/string-meta/issues/13
Ivan Herman: I understand the problem. Would love for the problem to be solved, but outside our influence
Benjamin Young: oh…and https://github.com/w3c/string-meta/issues/23
Ivan Herman: I don’t see any other proper way, other than having it done at the RDF level.
Benjamin Young: …and another https://github.com/w3c/string-meta/issues/11
Rob Sanderson: The bigger risk is to build on shifting sands and have RDF come up with a different syntax that’s incompatible with whatever we come up with
… should instead use it as a way to highlight the need, and potentially a micro-chartered group to solve it for RDF
Benjamin Young: Not ready to recharter, or make a new datatype. Rob proposes to kick it to another group and then an update to JSON-LD. Not a solution, but don’t want to lose the actions
… to close the issue we should state what can be done
… but need to be clear as to what /should/ be done that’s not confusing
Jeff Mixter: +1 to that
Proposed resolution: highlight the need for work is ongoing, but it should present what can be done today via language/data maps and/or using HTML (or other) micro-syntax for expressing multiple language (Benjamin Young)
Rob Sanderson: +1
Benjamin Young: +1
Jeff Mixter: +1
Ivan Herman: +1
Tim Cole: +1
Pierre-Antoine Champin: +1
Simon Steyskal: +1
Adam Soroka: +1
Resolution #5: highlight the need for work is ongoing, but it should present what can be done today via language/data maps and/or using HTML (or other) micro-syntax for expressing multiple language
Ivan Herman: procedural question - if we close the issue, then I think we will lose it for the bp doc. For the time being we don’t have an editor for the document. So don’t want it lost.
… should be raised in the BP repo
Rob Sanderson: +1
Benjamin Young: +1
Ivan Herman: should go through the issues to make sure we don’t lose them
Benjamin Young: Agreed – open editorial issues on BP?
… keep these initial discussion in the syntax doc, to not have the comments scattered
Ivan Herman: Wouldn’t close this one
Simon Steyskal: https://github.com/w3c/json-ld-bp/issues
Benjamin Young: not until there’s another issue to write it up
Ivan Herman: editor will write it up as they see best
Benjamin Young: And it’s the top of the hour
… thanks for all the input
Do we need a short new section on multilingual value issues?
Possible routes:
- Language maps -- needs the value to be langString
- Node: i18n Text Direction nodes / anno Textual Body / crm LinguisticObject
- Embedded HTML values -- not common practice
- Data indexing -- works, but doesn't survive round tripping through RDF
Other topics to include:
- Discussion of
@noneas a not-language.
This issue was discussed in a meeting.
- No actions or resolutions
View the transcript
Multilingual PatternsRob Sanderson: https://github.com/w3c/json-ld-bp/issues/5
Rob Sanderson: adam had noted that there is some confusion about how to use multilingual data values alongside language maps
Ivan Herman: I think two things are intertwined here
… the first is the use of language map, possibly with direction,
… the second is the use HTML literals.
… I would prefer to separate them in BP.
… gkellogg’s proposal was a hack to use almost the same syntax for two cases,
… which is pretty convoluted. It works, but should this be BP?
Gregg Kellogg: in one case, this is a language map; in the other case, this is data indexing.
Ivan Herman: yes, but using language tags for data-indexing is misleading.
… It mislead me.
Gregg Kellogg: language maps reflect in the RDF abstract syntax; data indexing is lost in the process.
Ivan Herman: the example is convoluted because it uses rdf:HTML,
… which I don’t think is very frequent.
Rob Sanderson: should we also discuss
@none in this context?Ivan Herman: yes