jena icon indicating copy to clipboard operation
jena copied to clipboard

Prologue not compatible with PrefixMapping2

Open Aklakan opened this issue 1 year ago • 3 comments

Version

4.7.0-SNAPSHOT

What happened?

Preface: As mentioned e.g. at https://issues.apache.org/jira/browse/JENA-2309 we are using a prefix cc dump (~2.9K prefixes) Also, with the LinkedSparqlQuery project, we are parsing logs of SPARQL queries. Often servers have predefined prefixes which do not appear in the logs so query parsing needs a "fragmented" approach with query string being parsed against a preconfigured set of prefixes. Because we want to parse million of queries we do not want to spend time uselessly copying billions of prefixes (all thousands prefixes for every query) - hence, an approach based on e.g. PrefixMapping2 is just what we want. I am mentioning this in order to avoid @afs from performing a quick fix that seals off the APIs I need by e.g. changing all methods to always do the copying :sweat_smile:

Unfortunately, the following breaks because Prologue setPrefix calls removeNsPrefix:

PrefixMapping pm = new PrefixMapping2(PrefixMapping.Extended);
Query query = new Query();
query.setPrefixMapping(pm);
QueryFactory.parse(query, "PREFIX rdf: <http://foo.bar/baz/> SELECT * {}", null, Syntax.syntaxARQ);

Relevant output and stacktrace

Caused by: java.lang.UnsupportedOperationException: PrefixMapping2: prefix 'rdf' in the immutable map
	at org.apache.jena.sparql.util.PrefixMapping2.removeNsPrefix(PrefixMapping2.java:68)
	at org.apache.jena.sparql.core.Prologue.setPrefix(Prologue.java:122)
	at org.apache.jena.sparql.lang.QueryParserBase.setPrefix(QueryParserBase.java:117)

Are you interested in making a pull request?

Maybe

Aklakan avatar Aug 29 '22 16:08 Aklakan

No code calls the PrefixMapping2 constructors so we can be delete it.

It says "Updates go to the local (second) copy only". It is (was) for locally extending prefixes mapping. In particular, it does not alter the global pmap which would be a concurrency problem.

And if you'd let the code through, you'd find PrefixMapping.Extended is immutable.

The general solution is a buffering prefix map (records diffs). BufferingPrefixMap. Sounds like what you want is the "add only.no flush" temporary extensions version of that.

NB PrefixMap vs PrefixMapping.

As a general point on JENA-2309 -- Jena has many interfaces. Tweaking implementations for one use case needs to be balanced with having clear code for general use (the majority of Jena users). One-impl-does-everything can be hard to maintain. Not everything is a bug.

afs avatar Aug 30 '22 08:08 afs

Hm, the advantage of PrefixMapping2 is that the internal 'added' map is again a PrefixMapping and the lookup inherts its performance characteristics.

BufferingPrefixMap only uses a plain HashMap for the additions which means iri-to-prefix lookups fall back to O(n) I guess in the presence of deletions there is not really a better way.

So BufferingPrefixMap is not a replacement from a performance perspective.

One-impl-does-everything can be hard to maintain.

Yes, but

  • most importantly, I'd expect that in general use a solution that can deliver O(log(n)) is appreciated over a O(n) one and
  • implementations are not that complex and
  • the existence of PrefixMapping2 (and the removed FastAbbreviatingPrefixMap) testifies that at least at some point these use case were relevant (and they still are for me).

I would also be fine with doing a port of PrefixMapping2 to e.g. PrefixMap2. But it's not exhilarating having to clutter my own projects with such rather basic stuff (especially when things already existed - and become subject to removal without replacement due to my reports).

And actually the problem is not even related to PrefixMapping2 but a needless call to removeNsPrefix in Prologue that would break any PrefixMapping2-like approach:

// class Prologue:
            String oldExpansion = prefixMap.getNsPrefixURI(prefix) ;
            if ( Objects.equals(oldExpansion, expansion) )
                return ;
            if ( oldExpansion != null ) // <- remove
                prefixMap.removeNsPrefix(prefix) ; // <- remove

            // Isn't the contract of PrefixMap anyway that this should replace a prior mapping?
            prefixMap.setNsPrefix(prefix, expansion) ;

Aklakan avatar Aug 30 '22 12:08 Aklakan

Hm, the advantage of PrefixMapping2 is that the internal 'added' map is again a PrefixMapping and the lookup inherts its performance characteristics.

There is a BufferingPrefixMapping as well.

a solution that can deliver O(log(n)) is appreciated over a O(n) one

I don't understand that.

PrefixMapBuffering, PrefixMapStd and BufferingPrefixMapping are O(1) prefix lookup.

iri-to-prefix

Parsing involves prefix-to-iri.

it's not exhilarating having to clutter my own projects with such rather basic stuff

After its many years of evolution, sure Jena does not meet your ideals or the requirements of your day-job, and you can see better ways of of doing things.

But Jena is what it is. Open source.

Code has to be maintained over years. Code does not get written then remain unchanging. Maintenance work, questions from users, reviewing PRs, releases, security alerts ... rather basic stuff.

Just recently, a lot of time has gone into the not-exhilarating work of cleaning up javadoc. Java changed. The javadoc was valid, and now (Java11) it isn't.

Jena is not a dumping ground. No volunteer open source project can be.

the problem is not even related to PrefixMapping2

Quite. Did you try the fix?

to avoid @afs from performing a quick fix that seals off the APIs I need

Don't make this personal.

afs avatar Aug 30 '22 13:08 afs

Don't make this personal.

Apologies, that was inappropriate.

Quite. Did you try the fix?

The PrefixMapping2 approach works with the PR. The change to Prologue by this PR should be valid anyway. If that class gets replaced with something more future proof I am of course fine with it.

Aklakan avatar Sep 25 '22 15:09 Aklakan