EasierRDF icon indicating copy to clipboard operation
EasierRDF copied to clipboard

replace "IRI" in spec language?

Open dhh1128 opened this issue 4 years ago • 9 comments

Per a suggestion in here, I wanted to suggest an update to the RDF spec language.

The heavy use of "IRI" as terminology in the RDF spec, referencing RFC 3987, raises a number of thorny issues and actually makes RDF out-of-sync with the latest developments at W3C. See https://www.w3.org/International/wiki/IRIStatus. (Also, the status of RFC 3987 has never moved past PROPOSED.)

If the spec continues to use the term, then there should probably be a section added to the spec to explain how RDF proposes to solve the IRI problems such as inconsistent use of punycode and percent encoding. I suspect it would be simpler to use "URL" everywhere, per W3C recommendations, with a simple (foot)note explaining that the intent of "URL" is to encompass internationalization as originally envisioned by the IRI effort, but will track the work of the new URL working group for the particulars.

dhh1128 avatar Jun 24 '20 20:06 dhh1128

What data is there from the field about problems that actually occur?

LIke any standards in usage at scale, some things are less than perfect. It is the practical impact that matters.

https://www.w3.org/International/wiki/IRIStatus raises issues about encoding of Internationalized Domain Names and presentation of Bidirectional Language.

The grammar for IRIs in RFC3987 over Unicode codepoint is solid.

RDF 1.1 uses IRIs but it doesn't define them, nor create them, encode or decode them. To some extent, it's garbage-in-garbage-out like any other data. The unicode string must conform to RFC3987 grammar and what it refers is consistent between the creator and any app receiving the data.

RDF 1.0, which uses the problematic-but-necessary-at-the-time "RDF URI References", is fortunately in the past.

The %XX issues are not specific to IRIs - it applies to URIs as well. %7E for example. RFC 3986 section 2.3 says "don't do that" and add that normalization should put the real character in. There is discussion in RDF spec section 3.2.

Putting in text to explain about IRIs, adding to section 3.2 guided by field experience to focus on what comes up for real seems to me the way to go.

afs avatar Jul 02 '20 21:07 afs

Agree with @afs. Another thing that might worth mentioning on section 3.2 is that IRIs might not work well with dereferencing, see section 6.1 in https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3198968 for details

jimkont avatar Jul 03 '20 08:07 jimkont

What data is there from the field about problems that actually occur?

I have a problem with the actual word IRI, which is understood by exactly none of the developers I've spoken with about the term over my carreer. Just an anecdote, but I'm pretty confident I'm not alone with that experience. The term IRI itself is not a huge barrier to entry, but given how impenetrable RDF already is, every micron we can chip off that barrier is going to help.

It's also worth noting the repercussions the IRI usage in RDF has on the RDF ecosystem, where JSON-LD is unable to remove the term (w3c/json-ld-syntax#355) since it's just a serialization of RDF and thus not within its scope or power to remove.

As an answer for why "IRI" should be replaced with "URL", I think the answer can be found in this very repository's README:

3. Backward compatibility is highly desirable, but less important than ease of use.

asbjornu avatar Sep 26 '20 23:09 asbjornu

Some concerns with using "URL" over "IRI" is that in most cases identifiers should not be URLs at all, but rather other types of URIs, such as URNs, or tag URIs, or example URIs, or even sometimes file URIs. The overuse of URLs when URNs would work well enough is not very helpful.

Also I'm unsure what process resulted in https://www.w3.org/International/wiki/IRIStatus, it seems like it was mostly written by one person.

aucampia avatar May 22 '22 23:05 aucampia

An easy solution is to introduce the notion of "RDF identifier", to define it (at first mention) as IRI and to replace all other IRI references with "RDF identifier". No change in meaning, but the term will be understood.

Am Mo., 23. Mai 2022 um 01:39 Uhr schrieb Iwan Aucamp < @.***>:

Some concerns with using "URL" over "IRI" is that in most cases identifiers should not be URLs at all, but rather other types of URIs, such as URNs, or tag URIs, or example URIs, or even sometimes file URIs. The overuse of URLs when URNs would work well enough is not very helpful.

— Reply to this email directly, view it on GitHub https://github.com/w3c/EasierRDF/issues/75#issuecomment-1134022047, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATZWSJA2DTNDOXRYKX4ZETVLLASTANCNFSM4OHIHT3A . You are receiving this because you are subscribed to this thread.Message ID: @.***>

chiarcos avatar May 23 '22 01:05 chiarcos

@aucampia

Some concerns with using "URL" over "IRI" is that in most cases identifiers should not be URLs at all, but rather other types of URIs, such as URNs, or tag URIs (...)

I believe that the advocates of replacing IRI with URL are considering https://url.spec.whatwg.org/ as the new reference for URL. And in that spec, the notion of URL encompases all IANA URI schemes:

A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.). Schemes should be registered in the IANA URI [sic] Schemes registry. [IANA-URI-SCHEMES] [RFC7595]

quoted from https://url.spec.whatwg.org/#url-writing

pchampin avatar May 23 '22 08:05 pchampin

I believe that the advocates of replacing IRI with URL are considering https://url.spec.whatwg.org/ as the new reference for URL. And in that spec, the notion of URL encompases all IANA URI schemes

You are indeed correct, I'm quite ambivalent on the effort there, seems like re-purposing URL to mean something different from what it meant in a previously ratified internet standard won't do much to alleviate confusion, just add to it. And also that spec has yet to publish a grammar and it is also very heavily geared towards browsers. I think whatever problems existed before is somewhat being compounded.

I'm also really unsure to what extent the complexity introduced by the concept of a URI or even a IRI is actually making things difficult.

aucampia avatar May 23 '22 09:05 aucampia

I would say at the very least, for https://url.spec.whatwg.org/ to be a considered a candidate, it should have a grammar and not just a parsing algorithm written in English.

aucampia avatar May 23 '22 09:05 aucampia

URIs and original URLs are ASCII to go in the HTTP request request-uri. (UTF-8 often works anyway nowadays and also it is hidden by browsers.) It does lead to confusion as to whether e.g. %7E is really some character (~) or is really %-7-E (encoding != escaping).

RFC 3986 refers to ALPHA and RFC 2234 (ABNF) defines : ALPHA as A-Z a-z.

RDF 1.0 has "RDF URI References" which anticipated IRIs. Except IRIs ended up slightly differently. RDF URI reference allow spaces.

Adding a new term for clarity in any revised RDF makes a lot of sense to me.

afs avatar May 23 '22 09:05 afs