ORCID-Source icon indicating copy to clipboard operation
ORCID-Source copied to clipboard

Invalid URLs are being passed to RDF Turtle output

Open ebremer opened this issue 2 years ago • 4 comments

Invalid URLs are also breaking Turtle as well. In the example, "https://orcid.org/0000-0003-3039-2116", the user has a url specified as

https://dial.uclouvain.be/pr/boreal/search/site/sm_creator:\"Van de Ven, Annelies\"

this will be passed back with text/turtle as

<https://dial.uclouvain.be/pr/boreal/search/site/sm_creator:"Van de Ven, Annelies">

which is invalid and will throw an error when read in by Apache Jena even though ORCID used Jena to generate the RDF. It's not something Jean will "fix" as per:

https://issues.apache.org/jira/browse/JENA-2351 and https://github.com/apache/jena/issues/1879

Spaces and quotes are illegal in the IRI.

ebremer avatar May 24 '23 17:05 ebremer

@TomDemeranville Any thoughts?

wjrsimpson avatar Dec 07 '23 18:12 wjrsimpson

Hi @ebremer . Thanks for reporting this. I think I understand the problem here. However, I've read through the issues you've linked to and I'm not sure I understand the solution. What should it do?

TomDemeranville avatar Dec 08 '23 10:12 TomDemeranville

Minimally, only emit the URI as a string and not as a bad URI. Preferably, rewrite the URI to make it legal, but not all sites will accept a corrected version so I understand it become problematical.

ebremer avatar Dec 08 '23 19:12 ebremer

Anything that is an invalid URI could be handle like this:

"https://dial.uclouvain.be/pr/boreal/search/site/sm_creator:\"Van de Ven, Annelies\""^^xsd:anyURI

see: https://www.w3.org/TR/xmlschema11-2/#anyURI

ebremer avatar Dec 14 '23 16:12 ebremer