ProvToolbox
ProvToolbox copied to clipboard
URI Check and File URI syntax
Provtoolbox does not perform any check on the syntax of URIs. This means that, for instance, when a PROV-N document is read, what occurs in between angle brackets:
prefix foo <some-uri-here>
is considered to be a URI but no syntactic check is performed that this string complies with the URI syntax.
It would be desirable for ProvToolbox to perform a syntactic check, and issue a warning if the uri syntax is not valid. A possibility is to use:
<dependency>
<groupId>org.apache.jena</groupId>
<artifactId>jena-iri</artifactId>
<version>3.0.0</version>
</dependency>
A point where this check could occur is when a QualifiedName is created, in addition to checking syntax of QualifiedName, we could also check the URI they denote.
A further issue related to URIs is concerned by Java's handling of File URI. Java represents a File URI as follows:
file:/my/path/to/a/file.ext
whereas the URI syntax is
file:///my/path/to/a/file.ext
This is probably due to Java supporting an older version of the file URI rfc.
A point where this matters is in prov-rdf, where a relative URI is converted in an absolute URI, by using a BaseURI. When the Base URI is not declared in the turtle file, it take to be the file URI. At that point, Java generates a file URI of the wrong syntax. This needs to be fixed, since we have seen in some application that interoperability with RDF packages such as Jena becomes problematic.
A further, secondary issue, is that the turtle file https://gist.githubusercontent.com/lucmoreau/b70a40401d933c01282f/raw/e55ee935208d0a571a4f14425e6dbbb5656ef1da/gistfile1.txt which contains a relative URI, processed by the provenance translator https://provenance.ecs.soton.ac.uk/validator/view/translator.html which leaks a working directory.