exist icon indicating copy to clipboard operation
exist copied to clipboard

[BUG] Execution time differs between different variants of predicates

Open dariok opened this issue 3 years ago • 3 comments

Describe the bug There is a difference in execution times between different variants of predicates in a query which should actually be the same.

The following data were measured on a set of c. 4.9 million elements with @xml:id. There is an index for @type. The tests used different values for the not() check each time (values in the table below are examples only) which were not used in any location so the resulting set always consisted of 4,880,027 items.

predicate run 1 run 2 run 3 run 4 average σ²
[not(@type = ('t1', 't2') and @xml:id] 50.84s 48.74s 47.86s 47.55s 48.75s 2.200
[not(@type = ('t3', 't4')][@xml:id] 47.55s 46.94s 46.41s 47.51s 47.10s 0.290
[@xml:id and not(@type = ('t5', 't6')] 44.75s 44.46s 44.67s 44.87s 44.69s 0.030
[@xml:id][not(@type = ('t7', 't8')] 46.00s 45.42s 45.60s 45.39s 44.59s 0.088

Several things can be observed here:

  • 1 and 2 are significantly slower than 3 and 4
  • variance for 1 and 2 is signinficantly larger than for 3 and 4 and for 1 is much larger than for 2
  • with the exception of 2, last run, the queries get fast each time for 1 and 2 but that may just be a random thing

Expected behavior While some variance is to be expected, all query times should be roughly the same.

To Reproduce (last run was)

xquery version "3.1";

collection("/db/apps/edoc/data/pd000006/texts/17xx")//*[@xml:id][not(@type = ('t31', 't32'))]

Context (please always complete the following information):

  • OS: Ubuntu 20.04
  • eXist-db version: 5.3.0
  • Java Version: OpenJDK Runtime Environment (build 11.0.14.1+1-Ubuntu-0ubuntu1.20.04)

Additional context

  • How is eXist-db installed? JAR
  • Any custom changes in e.g. conf.xml? max. RAM to 6G

dariok avatar Apr 27 '22 13:04 dariok

For the purposes of comparing timings, it would be advisable to use consistent parameters - so use t1, t2 throughout all examples. The reason is that, as the tuning article states - https://exist-db.org/exist/apps/doc/tuning#selective - performance is expected to change based on how "selective" each filter is in a sequence of filters.

Update: Reading the description a bit more carefully, I see that the not() filters always return true(), regardless of the t1, t2 values. So I retract my advice!

Did you change these values so that you could achieve "fresh" timings? Was it necessary to change the t1, t2 values?

joewiz avatar Apr 27 '22 13:04 joewiz

I used different values each time to avoid obfuscating the underlying problem – i. e. to get fresh timings, indeed. I am not sure whether some form of caching could actually interfere with the timing but I wanted to make sure this is not the case, hence changing the value each time (so, type 1, run 1 was ('t1', 't2') and the final run for type 4 was ('t31', 't32'))

dariok avatar Apr 27 '22 13:04 dariok

Did you restart the database when changing the predicates? If not, then eXist's caches from one set of tests may contribute to the subsequent set. To eliminate the cache as a variable, I'd advise to restart when changing a predicate.

joewiz avatar Apr 27 '22 14:04 joewiz