jena icon indicating copy to clipboard operation
jena copied to clipboard

Generating of syntactically incorrect queries when evaluating a service pattern

Open galgonek opened this issue 10 months ago • 1 comments

Version

5.3.0

What happened?

I have noticed that during the evaluation of a federated query, Jena performs substitution of a variable for a RDF term even in cases where the given RDF term cannot be valid.

This can be demonstrated by the following federated query:

SELECT * WHERE {
  VALUES ?P { "literal" }

  SERVICE <https://idsm.elixir-czech.cz/sparql/endpoint/idsm> {
      ?S ?P ?O.
  }
}

Jena sends the following query to the endpoint:

SELECT  *
WHERE
  { ?S  "literal"  ?O }

However, such a query is not syntactically correct.

Relevant output and stacktrace


Are you interested in making a pull request?

None

galgonek avatar Feb 07 '25 10:02 galgonek

  1. When a federated query with a VALUES clause is processed, the variable from the VALUES clause gets substituted with the literal value directly.
  2. This substitution happens in Service.java where it creates the query to send to the remote endpoint.
  3. The issue is that when a literal is substituted for the predicate position, it creates a syntactically invalid SPARQL query because predicates must be IRIs or variables, not literals.
  4. The specific code path responsible is in Service.java lines 189-190 where it does Op opRestored = Rename.reverseVarRename(opRemote, true); and then query = OpAsQuery.asQuery(opRestored);.
  5. The transformation doesn't validate whether the substitution results in a syntactically valid SPARQL pattern.

Proposed Fix for Federated Query Variable Substitution Bug

The issue is in the SPARQL SERVICE clause handling, where Jena substitutes a variable with a literal value even if that would create an invalid SPARQL query (for example, when using a literal as a predicate).

When a federated query like this is executed:

SELECT * WHERE { VALUES ?P { "literal" }

SERVICE https://idsm.elixir-czech.cz/sparql/endpoint/idsm { ?S ?P ?O. } }

Jena incorrectly sends this to the remote endpoint:

SELECT * WHERE { ?S "literal" ?O }

This is syntactically invalid since a predicate must be an IRI or a variable, not a literal.

Proposed solution:

  1. Add validation in org.apache.jena.sparql.core.Substitute.java to check if a substitution would create an invalid triple pattern:

public static Triple substitute(Triple triple, Binding binding) { if (isNotNeeded(binding)) return triple;

Node s = triple.getSubject();
Node p = triple.getPredicate();
Node o = triple.getObject();

Node s1 = substitute(s, binding);
Node p1 = substitute(p, binding);
Node o1 = substitute(o, binding);

// NEW: Validate that a literal in predicate position is not allowed
if (p1.isLiteral()) {
    // Either keep the original variable or throw an error
    // Option 1: Keep the original variable
    p1 = p;
    // Option 2: Throw an error
    // throw new QueryBuildException("Cannot substitute literal '" + p1 + "' in predicate position");
}

Triple t = triple;
if (s1 != s || p1 != p || o1 != o)
    t = Triple.create(s1, p1, o1);
return t;

}

  1. Alternatively, modify org.apache.jena.sparql.exec.http.Service.java to validate the resulting query after substitution:

Op opRestored = Rename.reverseVarRename(opRemote, true);

// Check if substitution created invalid triples // This could be done by traversing the op structure and checking for literals in predicate positions // For example: opRestored = validateSubstitutions(opRestored);

query = OpAsQuery.asQuery(opRestored);

Where validateSubstitutions would inspect the operation structure for any triple patterns where a literal is in predicate position.

The simplest and most conservative solution would be to implement option 1, which just keeps the original variable when a literal would be substituted into a predicate position that would make the query invalid.

plturrell avatar May 14 '25 01:05 plturrell