OBOFoundry.github.io icon indicating copy to clipboard operation
OBOFoundry.github.io copied to clipboard

Should we recommend specifying language tag?

Open mcourtot opened this issue 6 years ago • 27 comments

As per https://groups.google.com/forum/#!topic/obo-discuss/_x1MpwAjHQw, from Peter Midford:

Entering for string for a definition or a synonym in Protege I'm confronted with choosing a type (xsd:string) or a language (en), but it seems only one is allowed. Looking around in NBO, which I'm updating, it looks like type wins over language. So, my question is whether specifying type or language is the better practice in the OBO community.

mcourtot avatar Sep 12 '17 13:09 mcourtot

Relates to #325 and #437

nlharris avatar Apr 13 '20 20:04 nlharris

can someone answer @mcourtot's question?

nlharris avatar Dec 01 '20 01:12 nlharris

I'd vote yes. It's the standard, and doing it removes one thing in the way of adding translations. Alan

On Tue, Dec 1, 2020 at 1:24 AM Nomi Harris [email protected] wrote:

can someone answer @mcourtot https://github.com/mcourtot's question?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/OBOFoundry.github.io/issues/479#issuecomment-736155450, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB3CDTAUP5C5XMWJX6I3JDSSRAUFANCNFSM4D2SSV6Q .

alanruttenberg avatar Dec 01 '20 03:12 alanruttenberg

I agree with @alanruttenberg that a language tag is better than xsd:string for labels, definitions, synonyms, etc.

In practise, I see more xsd:strings than language tags but we should push to use language tags.

jamesaoverton avatar Dec 01 '20 13:12 jamesaoverton

I agree. I think xsd:string is essentially redundant - there is no good practical reason to annotate strings with xsd:string. I also believe language tags is the way to go here. Just telling the tooling that will be quite a challenge..

matentzn avatar Dec 01 '20 13:12 matentzn

I agree as well. A language tag is better than xsd:string.

yongqunh avatar Dec 01 '20 15:12 yongqunh

does this recommendation still need to be added somewhere?

nlharris avatar Jun 08 '21 19:06 nlharris

I added the Operations Commitee tag to just put this up for vote.

I think its straight forward to vote that we want to use language tags over xsd:string. However, the big question mark is what we want to recommend when comparing "nothing" to @en - there will be a lot of screams of agony if we require all English language labels, definitions etc to get an @en tag. But maybe that's the way to go to break the dominance of the English language in truly global world! I would vote for it, and I would volunteer helping the Foundry ontologies migrate. However, there are voices (I am sure @cmungall is one of them) that would say that "@en" on all literals will confuse the users :D But even here - we could say: use @en everywhere, and if your users are confused, export a version of your ontology without language tags. So, two votes:

Suggestion: Recommend to use language tags instead of xsd:string, and add the recommendation to the "common format" principle. This will require changes to obo2owl format parser and some work on the curation side. We wont require language tags across the board (gene names, peoples names, xrefs etc, thanks @alanruttenberg ), but ROBOT report will produce a warning if a class in an ontology has a label, synonym or definition that does not have a language tag.

  • 🚀 Yes. Its hard work, but we can overcome the technical challenges and I think it's worth it.
  • 👎 No. Its not worth it at this moment, not unless someone is being paid to do it.

matentzn avatar Jun 09 '21 07:06 matentzn

voices (I am sure @cmungall is one of them) that would say that "@en" on all literals will confuse the users

I am all for not confusing users, but I am not sure how this would confuse users, most of whom interact via OLS etc

All seems reasonable on the surface. I think the challenge is with the tooling, not policy. Provide people tools and they will do the right thing.

The main tooling need is in the obo2owl code. If standard sparql updates are provided in odk/robot then it will be easier for maintainers to migrate. But ideally this would be in the owlapi conversion code. That way there is no confusion in having the edit version be different owl than the release version. I don't think adding this to the owlapi is so hard but someone needs to manage the migration process.

I also think you need to give clear guidance on how to migrate. Many ontologies may use latin terms. Doing a replace-all of string to @en will yield incorrect results. Unless we consider the fact that a latin term is acceptable in formal english speaking contexts? Or maybe we should require two labels? Or one label plus an exact synonym?

There are methods to be able to infer whether a term is english or latin but this is work we would be putting on ontology developers, many of which have to balance limited resources against actual requests from curators rather than formal ontologists.

cmungall avatar Jun 14 '21 22:06 cmungall

In some of our UN work, language tags are very desirable for obvious reasons, and more interoperability efforts are also asking for multilingual support. Supportive of the language tag.

pbuttigieg avatar Jun 15 '21 16:06 pbuttigieg

I'm for the language tag when appropriate. That's the standard. It isn't appropriate for things that aren't language specific, such as the short name of a gene or protein. I'm not sure it is appropriate for names of people. It isn't appropriate for a value that is an IRI. It may be something that can be added by ROBOT so as not to confuse the poor biologists.

On Tue, Jun 15, 2021 at 10:30 AM Pier Luigi Buttigieg < @.***> wrote:

In some of our UN work, language tags are very desirable for obvious reasons, and more interoperability efforts are also asking for multilingual support. Supportive of the language tag.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/OBOFoundry.github.io/issues/479#issuecomment-861649815, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB3CDUJVDPBJZET4S6VKQTTS552PANCNFSM4D2SSV6Q .

alanruttenberg avatar Jun 15 '21 17:06 alanruttenberg

Ok the vote is now ready: a simple yes or no question: https://github.com/OBOFoundry/OBOFoundry.github.io/issues/479#issuecomment-857467300

matentzn avatar Jun 15 '21 17:06 matentzn

Ok the vote is now ready: a simple yes or no question: #479 (comment)

Not sure if this is planned to become more common but I like the idea that votes take place via github issues / comments.

cthoyt avatar Jun 15 '21 18:06 cthoyt

Open action items:

  • [x] Finalise the vote
  • [ ] Find resources to extend the OBO format parser to handle language tags correctly (alternatively, we can recommend stripping them from rdfs:label annotation prior to release using ODK)

matentzn avatar Jul 13 '21 08:07 matentzn

Looks like the vote so far is 3 yes and 2 "thumbs up" (not mentioned as a voting option, but I think we can assume those are also yeses).

nlharris avatar Feb 14 '22 19:02 nlharris

Ok, the outcome of the vote here is that we start recommending language tags in place of DT(string) for labels.

Next steps:

  • [ ] Add this to common format principle OBO (@nataled @lschriml @nicolevasilevsky EWG)
  • [ ] Making an announcement on OBO discuss about new OLS support (OFOC call)
  • [ ] Start building QC checks (probably our team)

matentzn avatar Feb 15 '22 09:02 matentzn

Not a pushback on the outcome of the vote, but on the process. One of the early decisions regarding new or changed principles is that they are discussed and voted on in an Operations call before wording is added by the EWG. I see that there was discussion of this during a call, but it's not clear to me that a vote was taken during a call. Has that happened?

nataled avatar Feb 15 '22 15:02 nataled

Sure, we can raise this one more time at the OFOC! Makes sense. I don't remember exactly what has happened wrt to the discussion there. So best just finalise the decision next Tuesday! Thank you @nataled

matentzn avatar Feb 15 '22 15:02 matentzn

Is there any guidance about what language tags are good, and which might be malformed? We're looking into permitted language variants over in https://github.com/FoodOntology/joint-food-ontology-wg/issues/25

ddooley avatar May 12 '22 14:05 ddooley

https://www.w3.org/International/questions/qa-choosing-language-tags It cites https://www.rfc-editor.org/rfc/rfc5646.txt as the most current RFC.

alanruttenberg avatar May 19 '22 17:05 alanruttenberg

@nataled Action items:

https://github.com/OBOFoundry/OBOFoundry.github.io/issues/479#issuecomment-1040038633

matentzn avatar Jun 27 '23 17:06 matentzn

A couple thoughts, and hopefully I am not stirring a hornet's nest:

  1. Should we be specific about which annotation properties this applies to? rdfs:label, skos:prefLabel, certain things from OMO?
  2. In some cases as Alan R. points out, xsd:string is absolutely the correct type. For example when annotating non-IRI identifiers such as RxCui on classes. Those identifiers are strings of numerals (and I would argue not numbers, but that's not important right now).

hoganwr avatar Jun 27 '23 17:06 hoganwr

@hoganwr based on https://github.com/OBOFoundry/OBOFoundry.github.io/issues/479#issuecomment-1040038633 it should be applied to label only (at least for now).

@matentzn I'm going to need some text along with specific instructions to be provided to users. Best if these instructions include directions for both OWL and OBO formats, but if you don't know the latter I can probably figure it out using an OWL-to-OBO converter.

I should mention that I'm becoming increasingly concerned that we are overloading the principles with directives that are quite ancillary to the principle at hand. This language tag thing, for example, while referring to format, is not really related to the format principle, which is about the overall artifact format (OBO, OWL, JSON, etc) and not about specific fields. I'm thinking we need to separate principles from specific details. I plan on raising this issue in a OFOC call.

nataled avatar Jun 27 '23 20:06 nataled

great point re: principles vs. specifics. As a fellow editorial WG member, this is just one more time where the match between a principle and a particular directive like this one is not obvious. I support discussion of splitting, what it means, and how to implement it.

On Tue, Jun 27, 2023 at 4:03 PM Darren A. Natale @.***> wrote:

@hoganwr https://github.com/hoganwr based on #479 (comment) https://github.com/OBOFoundry/OBOFoundry.github.io/issues/479#issuecomment-1040038633 it should be applied to label only (at least for now).

@matentzn https://github.com/matentzn I'm going to need some text along with specific instructions to be provided to users. Best if these instructions include directions for both OWL and OBO formats, but if you don't know the latter I can probably figure it out using an OWL-to-OBO converter.

I should mention that I'm becoming increasingly concerned that we are overloading the principles with directives that are quite ancillary to the principle at hand. This language tag thing, for example, while referring to format, is not really related to the format principle, which is about the overall artifact format (OBO, OWL, JSON, etc) and not about specific fields. I'm thinking we need to separate principles from specific details. I plan on raising this issue in a OFOC call.

— Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/OBOFoundry.github.io/issues/479#issuecomment-1610139234, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJR55VJZNCASXP3RE4BHELXNM4CJANCNFSM4D2SSV6Q . You are receiving this because you were mentioned.Message ID: @.***>

hoganwr avatar Jun 27 '23 20:06 hoganwr

There will be some iterations on this on the PR @nataled but you can start with this:

- For rdfs:label and IAO:0000115 annotation assertions, we discourage the use of datatype declarations such as `xsd:string`. It is important to note that `xsd:string` is essentially redundant in OWL/RDF, so "assay" and "assay"^^xsd:string should be the exact same thing. However, a lot of tooling may be confused by the difference, xsd:string datatype assertion SHOULD be omitted in general for all annotations, but MUST be omitted for rdfs:label and IAO:0000115.
- To designate rdfs:label, and IAO:0000115 annotations in a language different from English, a [valid RDF language tag](https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal) MUST be specified, for example, "Krankheit"@de.
- rdfs:label and IAO:0000115 annotation assertions for English content MAY be annotated with an English language tag. If the ontology chooses not to use language tags, a protege:defaultLanguage assertion MUST be added as an ontology annotation.

matentzn avatar Jun 28 '23 16:06 matentzn

@matentzn I'm confused. The votes and discussion suggests use of language tags, but the text you suggest effectively says to not use them for english.

alanruttenberg avatar Jul 01 '23 21:07 alanruttenberg

@alanruttenberg good Point, i forgot adding a note about that. I made it a bit less restrictive now, and added a third bullet on how to deal with English.

matentzn avatar Jul 02 '23 08:07 matentzn