Generating of syntactically incorrect queries when evaluating a service pattern
Version
5.3.0
What happened?
I have noticed that during the evaluation of a federated query, Jena performs substitution of a variable for a RDF term even in cases where the given RDF term cannot be valid.
This can be demonstrated by the following federated query:
SELECT * WHERE {
VALUES ?P { "literal" }
SERVICE <https://idsm.elixir-czech.cz/sparql/endpoint/idsm> {
?S ?P ?O.
}
}
Jena sends the following query to the endpoint:
SELECT *
WHERE
{ ?S "literal" ?O }
However, such a query is not syntactically correct.
Relevant output and stacktrace
Are you interested in making a pull request?
None
- When a federated query with a VALUES clause is processed, the variable from the VALUES clause gets substituted with the literal value directly.
- This substitution happens in Service.java where it creates the query to send to the remote endpoint.
- The issue is that when a literal is substituted for the predicate position, it creates a syntactically invalid SPARQL query because predicates must be IRIs or variables, not literals.
- The specific code path responsible is in Service.java lines 189-190 where it does Op opRestored = Rename.reverseVarRename(opRemote, true); and then query = OpAsQuery.asQuery(opRestored);.
- The transformation doesn't validate whether the substitution results in a syntactically valid SPARQL pattern.
Proposed Fix for Federated Query Variable Substitution Bug
The issue is in the SPARQL SERVICE clause handling, where Jena substitutes a variable with a literal value even if that would create an invalid SPARQL query (for example, when using a literal as a predicate).
When a federated query like this is executed:
SELECT * WHERE { VALUES ?P { "literal" }
SERVICE https://idsm.elixir-czech.cz/sparql/endpoint/idsm { ?S ?P ?O. } }
Jena incorrectly sends this to the remote endpoint:
SELECT * WHERE { ?S "literal" ?O }
This is syntactically invalid since a predicate must be an IRI or a variable, not a literal.
Proposed solution:
- Add validation in org.apache.jena.sparql.core.Substitute.java to check if a substitution would create an invalid triple pattern:
public static Triple substitute(Triple triple, Binding binding) { if (isNotNeeded(binding)) return triple;
Node s = triple.getSubject();
Node p = triple.getPredicate();
Node o = triple.getObject();
Node s1 = substitute(s, binding);
Node p1 = substitute(p, binding);
Node o1 = substitute(o, binding);
// NEW: Validate that a literal in predicate position is not allowed
if (p1.isLiteral()) {
// Either keep the original variable or throw an error
// Option 1: Keep the original variable
p1 = p;
// Option 2: Throw an error
// throw new QueryBuildException("Cannot substitute literal '" + p1 + "' in predicate position");
}
Triple t = triple;
if (s1 != s || p1 != p || o1 != o)
t = Triple.create(s1, p1, o1);
return t;
}
- Alternatively, modify org.apache.jena.sparql.exec.http.Service.java to validate the resulting query after substitution:
Op opRestored = Rename.reverseVarRename(opRemote, true);
// Check if substitution created invalid triples // This could be done by traversing the op structure and checking for literals in predicate positions // For example: opRestored = validateSubstitutions(opRestored);
query = OpAsQuery.asQuery(opRestored);
Where validateSubstitutions would inspect the operation structure for any triple patterns where a literal is in predicate position.
The simplest and most conservative solution would be to implement option 1, which just keeps the original variable when a literal would be substituted into a predicate position that would make the query invalid.