CDT: Literals can have the same lexical form but different values.
This means that two RDF terms (Node) can be same-term by RDF Term rules but have different values.
Thy can't be safely used as keys into a map or cache.
They may be created in different parser runs even into the same graph.
Encountered in #3570 in the CDT test suite.
Can there be a way to enforce the restrictions of use?
@hartig
// Parse twice, different bnodes.
public static void main(String... args) {
String x = """
PREFIX cdt: <http://w3id.org/awslabs/neptune/SPARQL-CDTs/>
PREFIX ex: <http://example.org/>
ex:s ex:p "[_:b, 42]"^^cdt:List .
""";
Graph g1 = RDFParser.fromString(x, Lang.TURTLE).toGraph();
Node o1 = g1.find().next().getObject();
Graph g2 = RDFParser.fromString(x, Lang.TURTLE).toGraph();
Node o2 = g2.find().next().getObject();
// Prints true
System.out.println("Same term: "+o1.sameTermAs(o2) );
// This case is caught by the datatype although it is an Expr exception.
// "blank nodes in lists cannot be compared"
try {
System.out.print("Same value: ");
System.out.println(o1.sameValueAs(o2) );
} catch (ExprEvalException ex) {
System.out.println("ExprEvalException: "+ex.getMessage());
} catch (Exception ex) {
ex.printStackTrace();
}
// Prints false.
System.out.println("LiteralValue: " +o1.getLiteralValue().equals(o2.getLiteralValue()));
}
Blank nodes in CDT literals have been the biggest challenge when we worked on the CDT spec. The behavior that you observe in your code snippet is indeed the expected behavior as per the definitions in the spec:
public static void main(String... args) { String x = """ PREFIX cdt: <http://w3id.org/awslabs/neptune/SPARQL-CDTs/> PREFIX ex: <http://example.org/> ex:s ex:p "[_:b, 42]"^^cdt:List . """; Graph g1 = RDFParser.fromString(x, Lang.TURTLE).toGraph(); Node o1 = g1.find().next().getObject(); Graph g2 = RDFParser.fromString(x, Lang.TURTLE).toGraph(); Node o2 = g2.find().next().getObject(); // Prints true System.out.println("Same term: "+o1.sameTermAs(o2) );
I think that true is correct here, at least in the sense of what it means for two literals to be the same term.
// This case is caught by the datatype although it is an Expr exception. // "blank nodes in lists cannot be compared" try { System.out.print("Same value: "); System.out.println(o1.sameValueAs(o2) ); } catch (ExprEvalException ex) { System.out.println("ExprEvalException: "+ex.getMessage());
Throwing an exception in this case is correct as per the CDT spec; in particular, the same-value comparison of two cdt:List literals is captured by the list-equal function where Step 5.5.1 covers the case of two blank nodes as list elements.
The reason why it is an ExprEvalException is because the exception is thrown in isEqual of CompositeDatatypeList, which is the method that is invoked for the = comparisons in expressions.
} catch (Exception ex) { ex.printStackTrace(); } // Prints false. System.out.println("LiteralValue: " +o1.getLiteralValue().equals(o2.getLiteralValue())); }
Returning false here is also in line with the CDT spec. The part of the spec that is relevant in this case is Section 5.2 Importing Requirements, which essentially says that, every time data with CDT literals is loaded, blank node identifiers in the lexical forms of the CDT literals work as identifiers only within the context of each such loading process but not across loading processes. For a motivation of this, see Section 5.1 Motivation. Given that the code above invokes the parser twice, this counts as two loading processes.
Can there be a way to enforce the restrictions of use?
I don't understand what you are asking here. Can you please elaborate.
Can there be a way to enforce the restrictions of use?
I don't understand what you are asking here. Can you please elaborate.
I was hoping that the CDT restrictions could be enforced/warned about but it looks like it can't.
CDTs can be used only in certain situations (read-only after a single parser run; careful use from SPARQL).
RDF datatypes are a functional mapping from lexical space to value space: "same lexical form" implies "same value" is everywhere in the code base.
Example: graphs are indexed by RDF terms. Java code to access a graph does - create a new Node, call graph.find(?, ?, literal).