ProvToolbox icon indicating copy to clipboard operation
ProvToolbox copied to clipboard

URI Check and File URI syntax

Open lucmoreau opened this issue 9 years ago • 0 comments

Provtoolbox does not perform any check on the syntax of URIs. This means that, for instance, when a PROV-N document is read, what occurs in between angle brackets:

prefix foo <some-uri-here>

is considered to be a URI but no syntactic check is performed that this string complies with the URI syntax.

It would be desirable for ProvToolbox to perform a syntactic check, and issue a warning if the uri syntax is not valid. A possibility is to use:

<dependency>
    <groupId>org.apache.jena</groupId>
    <artifactId>jena-iri</artifactId>
    <version>3.0.0</version>
</dependency>

A point where this check could occur is when a QualifiedName is created, in addition to checking syntax of QualifiedName, we could also check the URI they denote.

A further issue related to URIs is concerned by Java's handling of File URI. Java represents a File URI as follows:

file:/my/path/to/a/file.ext

whereas the URI syntax is

file:///my/path/to/a/file.ext

This is probably due to Java supporting an older version of the file URI rfc.

A point where this matters is in prov-rdf, where a relative URI is converted in an absolute URI, by using a BaseURI. When the Base URI is not declared in the turtle file, it take to be the file URI. At that point, Java generates a file URI of the wrong syntax. This needs to be fixed, since we have seen in some application that interoperability with RDF packages such as Jena becomes problematic.

A further, secondary issue, is that the turtle file https://gist.githubusercontent.com/lucmoreau/b70a40401d933c01282f/raw/e55ee935208d0a571a4f14425e6dbbb5656ef1da/gistfile1.txt which contains a relative URI, processed by the provenance translator https://provenance.ecs.soton.ac.uk/validator/view/translator.html which leaks a working directory.

lucmoreau avatar Aug 10 '15 21:08 lucmoreau