index performance degradation from range to new-range
I've compared the performance on simple element searches and found that the performance of the new -range index is worse in three different types of expression. I'm not sure how to create a test script as it's not the behaviour that is the issue here. As a start I've set up this Google Sheet with the results from Monex
Trips old range XConf
<collection xmlns="http://exist-db.org/collection-config/1.0">
<index>
<lucene>
<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<text qname="Trip"/>
</lucene>
<create qname="id" type="xs:string"/>
</index>
</collection>
Trips new-range XConf
<collection xmlns="http://exist-db.org/collection-config/1.0">
<index>
<lucene>
<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<text qname="Trip"/>
</lucene>
<range>
<create qname="id" type="xs:string"/>
</range>
</index>
</collection>
Query
xquery version "3.1";
declare variable $local:trip-path := "/db/apps/BSA/trips";
declare variable $local:trip-collection := collection("/db/apps/BSA/trips");
declare variable $local:trips := collection("/db/apps/BSA/trips")//Trip;
declare variable $local:trip-path-new := "/db/apps/BSA/trips-new";
declare variable $local:trip-collection-new := collection("/db/apps/BSA/trips-new");
declare variable $local:trips-new := collection("/db/apps/BSA/trips-new")//Trip;
declare function local:get-trip1($tripid) {
collection($local:trip-path)//Trip[id=$tripid]
};
declare function local:get-trip2($tripid) {
$local:trip-collection//Trip[id=$tripid]
};
declare function local:get-trip3($tripid) {
$local:trips[id=$tripid]
};
declare function local:get-trip1-new($tripid) {
collection($local:trip-path-new)//Trip[id=$tripid]
};
declare function local:get-trip2-new($tripid) {
$local:trip-collection-new//Trip[id=$tripid]
};
declare function local:get-trip3-new($tripid) {
$local:trips-new[id=$tripid]
};
let $tripid := "trip-250"
return
<tests>
<test>{ count(for $i in 1 to 1000 return local:get-trip1($tripid)) }</test>
<test>{ count(for $i in 1 to 1000 return local:get-trip2($tripid)) }</test>
<test>{ count(for $i in 1 to 1000 return local:get-trip3($tripid)) }</test>
<test>{ count(for $i in 1 to 1000 return local:get-trip1-new($tripid)) }</test>
<test>{ count(for $i in 1 to 1000 return local:get-trip2-new($tripid)) }</test>
<test>{ count(for $i in 1 to 1000 return local:get-trip3-new($tripid) }</test>
</tests>
In case 3 which uses a weak reference , there is no use of the index at all with new-range whereas under range, index Basic Indexing was used.
source: https://docs.google.com/spreadsheets/d/1Co2qCchUm7-AyrBWzCdx5xSIMrtb3rPdaV1_x5qfing/edit?usp=sharing
@KitWallace It would be good to have some example data to test with. Would you be able to provide a file that can be used to produce a test set?
We should include cases with a combined index
<collection xmlns="http://exist-db.org/collection-config/1.0">
<index>
<range>
<create qname="Trip" type="xs:string">
<field name="Trip-id" match="id" type="xs:string"/>
</create>
</range>
</index>
</collection>
http://85.119.83.23:8080/exist/apps/BSA/data/trips.zip
New timings
https://docs.google.com/spreadsheets/d/1Co2qCchUm7-AyrBWzCdx5xSIMrtb3rPdaV1_x5qfing/edit?gid=258071494#gid=258071494
field index very little different to the simpler form
Query
In an expanded form so that Monex index timings can be correlated with the text
xquery version "3.1";
declare variable $local:base := "/db/apps/BSA/";
declare variable $local:trip-path := concat($local:base,"/trips");
declare variable $local:trip-collection := collection(concat($local:base,"trips"));
declare variable $local:trips := collection(concat($local:base,"trips"))//Trip;
declare variable $local:trip-path-new := concat($local:base,"trips-new");
declare variable $local:trip-collection-new := collection(concat($local:base,"trips-new"));
declare variable $local:trips-new := collection(concat($local:base,"trips-new"))//Trip;
declare variable $local:trip-path-field := concat($local:base,"trips-field");
declare variable $local:trip-collection-field := collection(concat($local:base,"trips-field"));
declare variable $local:trips-field := collection(concat($local:base,"trips-field"))//Trip;
declare function local:get-trip1($tripid) {
collection($local:trip-path)//Trip[id=$tripid]
};
declare function local:get-trip2($tripid) {
$local:trip-collection//Trip[id=$tripid]
};
declare function local:get-trip3($tripid) {
$local:trips[id=$tripid]
};
declare function local:get-trip1-new($tripid) {
collection($local:trip-path-new)//Trip[id=$tripid]
};
declare function local:get-trip2-new($tripid) {
$local:trip-collection-new//Trip[id=$tripid]
};
declare function local:get-trip3-new($tripid) {
$local:trips-new[id=$tripid]
};
declare function local:get-trip1-field($tripid) {
collection($local:trip-path-field)//Trip[id=$tripid]
};
declare function local:get-trip2-field($tripid) {
$local:trip-collection-field//Trip[id=$tripid]
};
declare function local:get-trip3-field($tripid) {
$local:trips-field[id=$tripid]
};
let $tripid := "trip-250"
return
<tests>
<test>
{count(for $i in 1 to 1000
return local:get-trip1($tripid)
)
}
</test>
<test>
{count(for $i in 1 to 1000
return local:get-trip2($tripid)
)
}
</test>
<test>
{count(for $i in 1 to 1000
return local:get-trip3($tripid)
)
}
</test>
<test>
{count(for $i in 1 to 1000
return local:get-trip1-new($tripid)
)
}
</test>
<test>
{count(for $i in 1 to 1000
return local:get-trip2-new($tripid)
)
}
</test>
<test>
{count(for $i in 1 to 1000
return local:get-trip3-new($tripid)
)
}
</test>
<test>
{count(for $i in 1 to 1000
return local:get-trip1-field($tripid)
)
}
</test>
<test>
{count(for $i in 1 to 1000
return local:get-trip2-field($tripid)
)
}
</test>
<test>
{count(for $i in 1 to 1000
return local:get-trip3-field($tripid)
)
}
</test>
</tests>
I've timed the alternative of all Trips in one document rather than individual documents in a collection and its a mixed bag ; using the most efficient expression, for the old range index, time of single document to collection is 0.9 (faster); for new range its 1.92 (slower)
https://docs.google.com/spreadsheets/d/1Co2qCchUm7-AyrBWzCdx5xSIMrtb3rPdaV1_x5qfing/edit?gid=258071494#gid=258071494
Of course this is a small collection of documents - 3.4 Mb
Restructured index tests based on run-times and including xml:id and = /eq
xml:id is a clear winner, marginal differences for = and eq, new-range / old=range about 1.5; some forms to be avoided ; weak references working (but slow) in old but not in new-range
| id | preamble | search | setup-time | search-time | count |
|---|---|---|---|---|---|
| 14 | doc($local:trip-path-all-old-doc) | fn:id($tripid,$nodes) | 0 | 6 | 1000 |
| 11 | collection($local:trip-path-all-old) | $nodes//Trip[id = $tripid] | 0 | 20 | 1000 |
| 25 | collection($local:trip-path-all-old) | $nodes//Trip[id eq $tripid] | 0 | 20 | 1000 |
| 16 | collection($local:trip-path-old) | $nodes//Trip[id eq $tripid] | 3 | 26 | 1000 |
| 2 | collection($local:trip-path-old) | $nodes//Trip[id = $tripid] | 3 | 27 | 1000 |
| 22 | collection($local:trip-path-field) | $nodes//Trip[id eq $tripid] | 3 | 28 | 1000 |
| 8 | collection($local:trip-path-field) | $nodes//Trip[id = $tripid] | 3 | 30 | 1000 |
| 19 | collection($local:trip-path-new) | $nodes//Trip[id eq $tripid] | 3 | 39 | 1000 |
| 5 | collection($local:trip-path-new) | $nodes//Trip[id = $tripid] | 2 | 40 | 1000 |
| 13 | fn:id($tripid,doc($local:trip-path-all-old-doc)) | 0 | 59 | 1000 | |
| 10 | doc($local:trip-path-all-old-doc)//Trip[id = $tripid] | 0 | 84 | 1000 | |
| 24 | doc($local:trip-path-all-old-doc)//Trip[id eq $tripid] | 0 | 84 | 1000 | |
| 17 | collection($local:trip-path-old)//Trip | $nodes[id eq $tripid] | 3 | 607 | 1000 |
| 3 | collection($local:trip-path-old)//Trip | $nodes[id = $tripid] | 4 | 627 | 1000 |
| 26 | collection($local:trip-path-all-old)//Trip | $nodes[id eq $tripid] | 0 | 673 | 1000 |
| 12 | collection($local:trip-path-all-old)//Trip | $nodes[id = $tripid] | 1 | 704 | 1000 |
| 21 | collection($local:trip-path-field)//Trip[id eq $tripid] | 0 | 2714 | 1000 | |
| 15 | collection($local:trip-path-old)//Trip[id eq $tripid] | 0 | 2728 | 1000 | |
| 7 | collection($local:trip-path-field)//Trip[id = $tripid] | 0 | 2747 | 1000 | |
| 18 | collection($local:trip-path-new)//Trip[id eq $tripid] | 0 | 2810 | 1000 | |
| 4 | collection($local:trip-path-new)//Trip[id = $tripid] | 0 | 2832 | 1000 | |
| 1 | collection($local:trip-path-old)//Trip[id = $tripid] | 0 | 3045 | 1000 | |
| 20 | collection($local:trip-path-new)//Trip | $nodes[id eq $tripid] | 4 | 4537 | 1000 |
| 23 | collection($local:trip-path-field)//Trip | $nodes[id eq $tripid] | 3 | 4563 | 1000 |
| 9 | collection($local:trip-path-field)//Trip | $nodes[id = $tripid] | 3 | 4597 | 1000 |
| 6 | collection($local:trip-path-new)//Trip | $nodes[id = $tripid] | 4 | 4631 | 1000 |
@line-o I've added your suggestion to try $nodes/id[.=$tripid]/.. where $nodes = collection($local:trip-path-all-old)//Trip (a weak reference) and it uses the index while $nodes[id=$tripid] does not
--
and this form does use the index where| id | preamble | search | setup-time | search-time | count |
|---|---|---|---|---|---|
| 14 | doc($local:trip-path-all-old-doc) | fn:id($tripid,$nodes) | 1 | 7 | 1000 |
| 11 | collection($local:trip-path-all-old) | $nodes//Trip[id = $tripid] | 0 | 21 | 1000 |
| 8 | collection($local:trip-path-field) | $nodes//Trip[id = $tripid] | 3 | 31 | 1000 |
| 20 | collection($local:trip-path-new)//Trip | $nodes/id[. eq $tripid]/.. | 4 | 43 | 1000 |
| 5 | collection($local:trip-path-new) | $nodes//Trip[id = $tripid] | 3 | 44 | 1000 |
| 26 | collection($local:trip-path-all-old)//Trip | $nodes/id[. eq $tripid]/.. | 0 | 71 | 1000 |
| 13 | fn:id($tripid,doc($local:trip-path-all-old-doc)) | 0 | 77 | 1000 | |
| 10 | doc($local:trip-path-all-old-doc)//Trip[id = $tripid] | 0 | 87 | 1000 | |
| 25 | collection($local:trip-path-all-old) | $nodes//Trip/id[. eq $tripid]/.. | 0 | 135 | 1000 |
| 24 | doc($local:trip-path-all-old-doc)//Trip/id[. eq $tripid]/.. | 0 | 214 | 1000 | |
| 19 | collection($local:trip-path-new) | $nodes//Trip/id[. eq $tripid]/.. | 2 | 579 | 1000 |
| 12 | collection($local:trip-path-all-old)//Trip | $nodes[id = $tripid] | 0 | 663 | 1000 |
| 7 | collection($local:trip-path-field)//Trip[id = $tripid] | 0 | 2806 | 1000 | |
| 4 | collection($local:trip-path-new)//Trip[id = $tripid] | 0 | 2888 | 1000 | |
| 18 | collection($local:trip-path-new)//Trip/id[. eq $tripid]/.. | 0 | 3482 | 1000 | |
| 17 | collection($local:trip-path-old)//Trip | $nodes/id[. eq $tripid]/.. | 4 | 4501 | 1000 |
| 23 | collection($local:trip-path-field)//Trip | $nodes/id[. eq $tripid]/.. | 4 | 4602 | 1000 |
| 6 | collection($local:trip-path-new)//Trip | $nodes[id = $tripid] | 4 | 4724 | 1000 |
| 9 | collection($local:trip-path-field)//Trip | $nodes[id = $tripid] | 4 | 4725 | 1000 |
| 16 | collection($local:trip-path-old) | $nodes//Trip/id[. eq $tripid]/.. | 3 | 5060 | 1000 |
| 22 | collection($local:trip-path-field) | $nodes//Trip/id[. eq $tripid]/.. | 3 | 5142 | 1000 |
| 3 | collection($local:trip-path-old)//Trip | $nodes[id = $tripid] | 4 | 5203 | 1000 |
| 2 | collection($local:trip-path-old) | $nodes//Trip[id = $tripid] | 3 | 5696 | 1000 |
| 1 | collection($local:trip-path-old)//Trip[id = $tripid] | 0 | 8023 | 1000 | |
| 15 | collection($local:trip-path-old)//Trip/id[. eq $tripid]/.. | 0 | 8112 | 1000 | |
| 21 | collection($local:trip-path-field)//Trip/id[. eq $tripid]/.. | 0 | 8154 | 1000 |