jena icon indicating copy to clipboard operation
jena copied to clipboard

PrefixMapStd is very slow for lookups that 'miss'

Open Aklakan opened this issue 3 years ago • 0 comments

PR: https://github.com/apache/jena/pull/1475

Version

4.6.0-SNAPSHOT

What happened?

Another follow up on https://issues.apache.org/jira/browse/JENA-2309 I investigated the issue of slow PrefixMapStd further and it turns out that when there are 'hits' using its fast track approach it's very fast - however on 'misses' it gets extremely slow because then it falls back to iteration of all prefixes in the map.

Using a small benchmark runner (part of the PR - see here) shows the the PrefixMapStd can be easily brought to require dozens of seconds for lookups that can actually be accomplished in a split second. For that I created a new PrefixMap intended as a replacement that combines the best of both worlds: The fast track of PrefixMapStd and the trie backing (together with a guava cache) of the removed FastAbbreviatingPrefixMap.

Approach 0 = Jena's original PrefixMapStd:

Run 1 with base <http://example.org/> and separator / using approach 0: Static IRI lookups took 0.082 seconds
Run 1 with base <http://example.org/> and separator / using approach 0: Dynamic IRI lookups took 33.443 seconds
Run 1 with base <urn:foo:bar:> and separator : using approach 0: Static IRI lookups took 25.482 seconds
Run 1 with base <urn:foo:bar:> and separator : using approach 0: Dynamic IRI lookups took 31.088 seconds

Approach 1 = my optimized PrefixMap:

Run 1 with base <http://example.org/> and separator / using approach 1: Static IRI lookups took 0.070 seconds
Run 1 with base <http://example.org/> and separator / using approach 1: Dynamic IRI lookups took 0.401 seconds
Run 1 with base <urn:foo:bar:> and separator : using approach 1: Static IRI lookups took 0.099 seconds
Run 1 with base <urn:foo:bar:> and separator : using approach 1: Dynamic IRI lookups took 0.433 seconds

W.r.t. to the interface contract of PrefixMap I am not sure whether synchronization / thread-safe maps need to be used. PrefixMapStd used a ConcurrentHashMap but if synchronization is actually not a concern then a LinkedHashMap that gives control over the order may be a better choice.

Are you interested in making a pull request?

Yes

Aklakan avatar Aug 09 '22 12:08 Aklakan