java-rdfa icon indicating copy to clipboard operation
java-rdfa copied to clipboard

Pages without a base URI cannot be parsed

Open quoll opened this issue 15 years ago • 6 comments

Web pages without a base URI cannot be parsed with the default Resolvers. For instance, one of the example pages on the home page doesn't have a base: http://examples.tobyinkster.co.uk/hcard

The IRIResolver first looks at the "base" (provided in the first parameter) and throws a RuntimeException if this is null. Since web pages can be messy, then this will happen too often for an approach like this to be useful. The simplest solution for the user is to subclass IRIResolver with something like: public String resolver(String base, String relative) { return super(base == null ? getDefaultBase() : base, relative); }

However, with the aim of providing flexibility to all web pages, this is required all the time, and not just for special configurations. Also, the docs on the home page don't mention this class at all, so some exploring needed to find the solution.

I'd like to suggest using one of these approaches:

  1. Drop the method ParserFactory.createReaderForFormat(StatementSink, ParserFactory.Format , Setting...), and make IRIResolver abstract.
  2. Introduce a new version of ParserFactory.createReaderForFormat that has a "default base" as a new parameter. Since createReaderForFormat is already the main method that users read about, then the issue should be more apparent to them.
  3. Add a method to the StatementSink that allows the sink to specify where it wants relative URIs to be resolved against. This is the main code that a user will be writing (other than the call to createReaderForFormat), so it is the other reasonable place to consider putting it. The base could be returned from something like getDefaultBase. This is my least-preferred method, since it breaks existing implementations of StatementSink.

quoll avatar Aug 09 '10 18:08 quoll

Thanks for the report. Oh how I hate base...

Of course http://examples.tobyinkster.co.uk/hcard does have a base (i.e. that uri), but I know what you mean. InputSource is permitted to have a null systemId, which means it's always possible to sneak null bases in to the parser

ParserFactory needs a rethink anyway, so I'm not averse to changing it at all. However I do like the StatementSink idea.

shellac avatar Aug 10 '10 22:08 shellac

Is there a way now to parse RDFa containing reified statements? For example, let's say the following RDFa-snippet:

<div class="myRDFaContainer" xmlns:mex="http://my.example.com/foobar/">
  <div class="description">
    <div rel="rdf:type" resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
    <div xmlns:d1e26="http://www.w3.org/2001/XMLSchema#" property="mex:disclaimer" content="Some important disclaimer." datatype="xsd:string" />
    <div rel="rdf:object" resource="http://my.example.com/foobar/jsr234_v1_1_camera" />
    <div rel="rdf:predicate" resource="http://my.example.com/foobar/java_api" />
    <div rel="rdf:subject" resource="http://my.example.com/id/abc123/My-Device" />
  </div>
  <div class="description">
    <div rel="rdf:type" resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
    <div xmlns:d1e33="http://www.w3.org/2001/XMLSchema#" property="mex:disclaimer" content="Another important disclaimer." datatype="xsd:string" />
    <div xmlns:d1e35="http://www.w3.org/2001/XMLSchema#" property="rdf:object" content="57671680" datatype="xsd:nonNegativeInteger" />
    <div rel="rdf:predicate" resource="http://my.example.com/foobar/shared_memory" />
    <div rel="rdf:subject" resource="http://my.example.com/id/abc123/My-Device" />
  </div>
  <div class="description" about="http://my.example.com/id/abc123/My-Device">
    <div property="mex:identifier" content="http://my.example.com/id/abc123/My-Device" />
    <div rel="mex:java_api" resource="http://my.example.com/foobar/jsr234_v1_1_camera" />
    <div xmlns:d1e9="http://www.w3.org/2001/XMLSchema#" property="mex:shared_memory" content="57671680" datatype="xsd:nonNegativeInteger" />
  </div>
</div>

I cannot get java-rdfa to parse the reified statements correctly. I would like to parse the previous RDFa as the following RDF/XML:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:mex="http://my.example.com/foobar/">
  <rdf:Description>
    <mex:disclaimer rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Some important disclaimer.</mex:disclaimer>
    <rdf:object rdf:resource="http://my.example.com/foobar/jsr234_v1_1_camera" />
    <rdf:predicate rdf:resource="http://my.example.com/foobar/java_api" />
    <rdf:subject rdf:resource="http://my.example.com/id/abc123/My-Device" />
    <rdf:type rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
  </rdf:Description>
  <rdf:Description>
    <mex:disclaimer rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Another important disclaimer.</mex:disclaimer>
    <rdf:object rdf:datatype="http://www.w3.org/2001/XMLSchema#nonNegativeInteger">57671680</rdf:object>
    <rdf:predicate rdf:resource="http://my.example.com/foobar/shared_memory" />
    <rdf:subject rdf:resource="http://my.example.com/id/abc123/My-Device" />
    <rdf:type rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
  </rdf:Description>
  <rdf:Description rdf:about="http://my.example.com/id/abc123/My-Device">
    <mex:identifier>http://my.example.com/id/abc123/My-Device</mex:identifier>
    <mex:java_api rdf:resource="http://my.example.com/foobar/jsr234_v1_1_camera" />
    <mex:shared_memory rdf:datatype="http://www.w3.org/2001/XMLSchema#nonNegativeInteger">57671680</mex:shared_memory>
  </rdf:Description>
</rdf:RDF>

How to achieve this? The IRIResolver forces me to use a non-null base, which IMO disables me from parsing reified statement. Am I getting this right?

Pyppe avatar Nov 30 '10 10:11 Pyppe

This has nothing to do with base. You haven't declared the rdf namespace. Try adding xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" and it will work

shellac avatar Nov 30 '10 10:11 shellac

Namespaces are defined, but I dropped them out from my previous message for simplicity.

The problem is that because the IRIResolver requires base, I cannot parse the reified statements, because I'm forced to declare some base url. If I use e.g. "http://my.example.com/id/abc123/My-Device" as the base url, the reified staments are not included in the parsed model.

I guess I could (maybe?) solve this by defining some unique base url (such as. "http://my-pseudo-base-url"). Then when I would parse the RDFa with this code: Class.forName("net.rootdev.javardfa.jena.RDFaReader"); Model model = ModelFactory.createDefaultModel(); model.read(inputDataContainingRDFa, "http://my-pseudo-base-url", "XHTML");

This results in: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:mex="http://my.example.com/foobar/"> <rdf:Description about="http://my-pseudo-base-url"> <rdf:subject rdf:resource="http://my.example.com/id/abc123/My-Device"/> <rdf:predicate rdf:resource="http://my.example.com/foobar/java_api"/> <rdf:predicate rdf:resource="http://my.example.com/foobar/shared_memory"/> <rdf:object rdf:resource="http://my.example.com/foobar/jsr234_v1_1_camera"/> <rdf:object rdf:datatype="http://www.w3.org/2001/XMLSchema#nonNegativeInteger">57671680/rdf:object <rdf:type rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement"/> <voc:disclaimer rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Another important disclaimer./voc:disclaimer <voc:disclaimer rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Some important disclaimer./voc:disclaimer /rdf:Description <rdf:Description rdf:about="http://my.example.com/id/abc123/My-Device"> mex:identifierhttp://my.example.com/id/abc123/My-Device/mex:identifier <mex:java_api rdf:resource="http://my.example.com/foobar/jsr234_v1_1_camera" /> <mex:shared_memory rdf:datatype="http://www.w3.org/2001/XMLSchema#nonNegativeInteger">57671680/mex:shared_memory /rdf:Description /rdf:RDF

That would by kind of what I'd need. Except that the statements about "http://my-pseudo-base-url" should be "anonymous" (aka. reified). And they are defined (incorrectly I believe) within a common block.

I believe the exemplary RDFa-snippet should be valid case of reification. Am I doing something wrong, or how can I parse this kind of RDFa correcly -- preserving the reified statements?

Pyppe avatar Nov 30 '10 11:11 Pyppe

anonymous != reified. If you want them to be anonymous add a typeof="rdf:Statement" to the containing div.

The problem is that you really need a base, since you haven't provided a subject for your triples. The typeof, or adding an about somewhere, would fix this.

shellac avatar Dec 01 '10 12:12 shellac

Thanks for the insight. That missing typeof attribute was indeed the problem.

Works now. Much obliged!

Pyppe avatar Dec 07 '10 11:12 Pyppe