exist icon indicating copy to clipboard operation
exist copied to clipboard

index performance degradation from range to new-range

Open KitWallace opened this issue 2 months ago • 7 comments

I've compared the performance on simple element searches and found that the performance of the new -range index is worse in three different types of expression. I'm not sure how to create a test script as it's not the behaviour that is the issue here. As a start I've set up this Google Sheet with the results from Monex

Trips old range XConf

<collection xmlns="http://exist-db.org/collection-config/1.0">
  <index>
    <lucene>
      <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
      <text qname="Trip"/>
    </lucene>
    <create qname="id" type="xs:string"/>
  </index>
</collection>

Trips new-range XConf

<collection xmlns="http://exist-db.org/collection-config/1.0">
  <index>
     <lucene>
       <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
       <text qname="Trip"/>
     </lucene>
     <range>
       <create qname="id" type="xs:string"/>
     </range>
  </index>
</collection>

Query

xquery version "3.1";
 
declare variable $local:trip-path := "/db/apps/BSA/trips";
declare variable $local:trip-collection := collection("/db/apps/BSA/trips");
declare variable $local:trips := collection("/db/apps/BSA/trips")//Trip;
declare variable $local:trip-path-new := "/db/apps/BSA/trips-new";
declare variable $local:trip-collection-new := collection("/db/apps/BSA/trips-new");
declare variable $local:trips-new := collection("/db/apps/BSA/trips-new")//Trip;
 
declare function local:get-trip1($tripid) {
  collection($local:trip-path)//Trip[id=$tripid]
};
 
declare function local:get-trip2($tripid) {
  $local:trip-collection//Trip[id=$tripid]
};
 
declare function local:get-trip3($tripid) {
  $local:trips[id=$tripid]
};
 
declare function local:get-trip1-new($tripid) {
  collection($local:trip-path-new)//Trip[id=$tripid]
};
 
declare function local:get-trip2-new($tripid) {
  $local:trip-collection-new//Trip[id=$tripid]
};
 
declare function local:get-trip3-new($tripid) {
  $local:trips-new[id=$tripid]
};

let $tripid := "trip-250"
return
  <tests>
    <test>{ count(for $i in 1 to 1000 return local:get-trip1($tripid)) }</test>
    <test>{ count(for $i in 1 to 1000 return local:get-trip2($tripid)) }</test>
    <test>{ count(for $i in 1 to 1000 return local:get-trip3($tripid)) }</test>
    <test>{ count(for $i in 1 to 1000 return local:get-trip1-new($tripid)) }</test>
    <test>{ count(for $i in 1 to 1000 return local:get-trip2-new($tripid)) }</test>
    <test>{ count(for $i in 1 to 1000 return local:get-trip3-new($tripid) }</test>
  </tests>

In case 3 which uses a weak reference , there is no use of the index at all with new-range whereas under range, index Basic Indexing was used.

source: https://docs.google.com/spreadsheets/d/1Co2qCchUm7-AyrBWzCdx5xSIMrtb3rPdaV1_x5qfing/edit?usp=sharing

KitWallace avatar Oct 15 '25 14:10 KitWallace

@KitWallace It would be good to have some example data to test with. Would you be able to provide a file that can be used to produce a test set?

line-o avatar Oct 16 '25 18:10 line-o

We should include cases with a combined index

<collection xmlns="http://exist-db.org/collection-config/1.0">
  <index>
     <range>
       <create qname="Trip" type="xs:string">
          <field name="Trip-id" match="id" type="xs:string"/>
       </create>
     </range>
  </index>
</collection>

line-o avatar Oct 16 '25 18:10 line-o

http://85.119.83.23:8080/exist/apps/BSA/data/trips.zip

KitWallace avatar Oct 16 '25 21:10 KitWallace

New timings

https://docs.google.com/spreadsheets/d/1Co2qCchUm7-AyrBWzCdx5xSIMrtb3rPdaV1_x5qfing/edit?gid=258071494#gid=258071494

field index very little different to the simpler form

Query

In an expanded form so that Monex index timings can be correlated with the text

xquery version "3.1";

declare variable $local:base := "/db/apps/BSA/";
declare variable $local:trip-path := concat($local:base,"/trips");
declare variable $local:trip-collection := collection(concat($local:base,"trips"));
declare variable $local:trips := collection(concat($local:base,"trips"))//Trip;
declare variable $local:trip-path-new := concat($local:base,"trips-new");
declare variable $local:trip-collection-new := collection(concat($local:base,"trips-new"));
declare variable $local:trips-new := collection(concat($local:base,"trips-new"))//Trip;
declare variable $local:trip-path-field := concat($local:base,"trips-field");
declare variable $local:trip-collection-field := collection(concat($local:base,"trips-field"));
declare variable $local:trips-field := collection(concat($local:base,"trips-field"))//Trip;

declare function local:get-trip1($tripid) {
   collection($local:trip-path)//Trip[id=$tripid] 
};

declare function local:get-trip2($tripid) {
   $local:trip-collection//Trip[id=$tripid] 
};

declare function local:get-trip3($tripid) {
   $local:trips[id=$tripid] 
};

declare function local:get-trip1-new($tripid) {
   collection($local:trip-path-new)//Trip[id=$tripid] 
};

declare function local:get-trip2-new($tripid) {
   $local:trip-collection-new//Trip[id=$tripid] 
};

declare function local:get-trip3-new($tripid) {
   $local:trips-new[id=$tripid] 
};

declare function local:get-trip1-field($tripid) {
   collection($local:trip-path-field)//Trip[id=$tripid] 
};

declare function local:get-trip2-field($tripid) {
   $local:trip-collection-field//Trip[id=$tripid] 
};

declare function local:get-trip3-field($tripid) {
   $local:trips-field[id=$tripid] 
};
let $tripid := "trip-250"
return
  <tests>
      <test>
        {count(for $i in 1 to 1000
        return local:get-trip1($tripid)
        )
        } 
      </test>
     <test>
        {count(for $i in 1 to 1000
        return local:get-trip2($tripid)
        )
        } 
      </test>
      <test>
        {count(for $i in 1 to 1000
        return local:get-trip3($tripid)
        )
        } 
      </test>
       <test>
        {count(for $i in 1 to 1000
        return local:get-trip1-new($tripid)
        )
        } 
      </test>
     <test>
        {count(for $i in 1 to 1000
        return local:get-trip2-new($tripid)
        )
        } 
      </test>
      <test>
        {count(for $i in 1 to 1000
        return local:get-trip3-new($tripid)
        )
        } 
      </test>
            <test>
        {count(for $i in 1 to 1000
        return local:get-trip1-field($tripid)
        )
        } 
      </test>
     <test>
        {count(for $i in 1 to 1000
        return local:get-trip2-field($tripid)
        )
        } 
      </test>
      <test>
        {count(for $i in 1 to 1000
        return local:get-trip3-field($tripid)
        )
        } 
      </test>
  </tests>

KitWallace avatar Oct 16 '25 22:10 KitWallace

I've timed the alternative of all Trips in one document rather than individual documents in a collection and its a mixed bag ; using the most efficient expression, for the old range index, time of single document to collection is 0.9 (faster); for new range its 1.92 (slower)

https://docs.google.com/spreadsheets/d/1Co2qCchUm7-AyrBWzCdx5xSIMrtb3rPdaV1_x5qfing/edit?gid=258071494#gid=258071494

Of course this is a small collection of documents - 3.4 Mb

KitWallace avatar Oct 20 '25 11:10 KitWallace

Restructured index tests based on run-times and including xml:id and = /eq

xml:id is a clear winner, marginal differences for = and eq, new-range / old=range about 1.5; some forms to be avoided ; weak references working (but slow) in old but not in new-range

--
id preamble search setup-time search-time count
14 doc($local:trip-path-all-old-doc) fn:id($tripid,$nodes) 0 6 1000
11 collection($local:trip-path-all-old) $nodes//Trip[id = $tripid] 0 20 1000
25 collection($local:trip-path-all-old) $nodes//Trip[id eq $tripid] 0 20 1000
16 collection($local:trip-path-old) $nodes//Trip[id eq $tripid] 3 26 1000
2 collection($local:trip-path-old) $nodes//Trip[id = $tripid] 3 27 1000
22 collection($local:trip-path-field) $nodes//Trip[id eq $tripid] 3 28 1000
8 collection($local:trip-path-field) $nodes//Trip[id = $tripid] 3 30 1000
19 collection($local:trip-path-new) $nodes//Trip[id eq $tripid] 3 39 1000
5 collection($local:trip-path-new) $nodes//Trip[id = $tripid] 2 40 1000
13 fn:id($tripid,doc($local:trip-path-all-old-doc)) 0 59 1000
10 doc($local:trip-path-all-old-doc)//Trip[id = $tripid] 0 84 1000
24 doc($local:trip-path-all-old-doc)//Trip[id eq $tripid] 0 84 1000
17 collection($local:trip-path-old)//Trip $nodes[id eq $tripid] 3 607 1000
3 collection($local:trip-path-old)//Trip $nodes[id = $tripid] 4 627 1000
26 collection($local:trip-path-all-old)//Trip $nodes[id eq $tripid] 0 673 1000
12 collection($local:trip-path-all-old)//Trip $nodes[id = $tripid] 1 704 1000
21 collection($local:trip-path-field)//Trip[id eq $tripid] 0 2714 1000
15 collection($local:trip-path-old)//Trip[id eq $tripid] 0 2728 1000
7 collection($local:trip-path-field)//Trip[id = $tripid] 0 2747 1000
18 collection($local:trip-path-new)//Trip[id eq $tripid] 0 2810 1000
4 collection($local:trip-path-new)//Trip[id = $tripid] 0 2832 1000
1 collection($local:trip-path-old)//Trip[id = $tripid] 0 3045 1000
20 collection($local:trip-path-new)//Trip $nodes[id eq $tripid] 4 4537 1000
23 collection($local:trip-path-field)//Trip $nodes[id eq $tripid] 3 4563 1000
9 collection($local:trip-path-field)//Trip $nodes[id = $tripid] 3 4597 1000
6 collection($local:trip-path-new)//Trip $nodes[id = $tripid] 4 4631 1000

KitWallace avatar Oct 22 '25 11:10 KitWallace

@line-o I've added your suggestion to try $nodes/id[.=$tripid]/.. where $nodes = collection($local:trip-path-all-old)//Trip (a weak reference) and it uses the index while $nodes[id=$tripid] does not

--

and this form does use the index where --
id preamble search setup-time search-time count
14 doc($local:trip-path-all-old-doc) fn:id($tripid,$nodes) 1 7 1000
11 collection($local:trip-path-all-old) $nodes//Trip[id = $tripid] 0 21 1000
8 collection($local:trip-path-field) $nodes//Trip[id = $tripid] 3 31 1000
20 collection($local:trip-path-new)//Trip $nodes/id[. eq $tripid]/.. 4 43 1000
5 collection($local:trip-path-new) $nodes//Trip[id = $tripid] 3 44 1000
26 collection($local:trip-path-all-old)//Trip $nodes/id[. eq $tripid]/.. 0 71 1000
13 fn:id($tripid,doc($local:trip-path-all-old-doc)) 0 77 1000
10 doc($local:trip-path-all-old-doc)//Trip[id = $tripid] 0 87 1000
25 collection($local:trip-path-all-old) $nodes//Trip/id[. eq $tripid]/.. 0 135 1000
24 doc($local:trip-path-all-old-doc)//Trip/id[. eq $tripid]/.. 0 214 1000
19 collection($local:trip-path-new) $nodes//Trip/id[. eq $tripid]/.. 2 579 1000
12 collection($local:trip-path-all-old)//Trip $nodes[id = $tripid] 0 663 1000
7 collection($local:trip-path-field)//Trip[id = $tripid] 0 2806 1000
4 collection($local:trip-path-new)//Trip[id = $tripid] 0 2888 1000
18 collection($local:trip-path-new)//Trip/id[. eq $tripid]/.. 0 3482 1000
17 collection($local:trip-path-old)//Trip $nodes/id[. eq $tripid]/.. 4 4501 1000
23 collection($local:trip-path-field)//Trip $nodes/id[. eq $tripid]/.. 4 4602 1000
6 collection($local:trip-path-new)//Trip $nodes[id = $tripid] 4 4724 1000
9 collection($local:trip-path-field)//Trip $nodes[id = $tripid] 4 4725 1000
16 collection($local:trip-path-old) $nodes//Trip/id[. eq $tripid]/.. 3 5060 1000
22 collection($local:trip-path-field) $nodes//Trip/id[. eq $tripid]/.. 3 5142 1000
3 collection($local:trip-path-old)//Trip $nodes[id = $tripid] 4 5203 1000
2 collection($local:trip-path-old) $nodes//Trip[id = $tripid] 3 5696 1000
1 collection($local:trip-path-old)//Trip[id = $tripid] 0 8023 1000
15 collection($local:trip-path-old)//Trip/id[. eq $tripid]/.. 0 8112 1000
21 collection($local:trip-path-field)//Trip/id[. eq $tripid]/.. 0 8154 1000

KitWallace avatar Oct 24 '25 08:10 KitWallace