exist
exist copied to clipboard
[BUG] `binary` Attribute in Lucene full text fields does not work any more.
Describe the bug
The use of binary attribute for fields in full text index does not work any more.
Expected behavior
The use of the binaryattribute should work.
To Reproduce
Try to index a field with as binary.
With binary:
xquery version "3.1";
module namespace t="http://exist-db.org/xquery/test";
(: LIBRARIES :)
declare namespace test="http://exist-db.org/xquery/xqsuite";
(: NAMESPACES :)
declare namespace array="http://www.w3.org/2005/xpath-functions/array";
declare namespace exist="http://exist.sourceforge.net/NS/exist";
declare namespace ft="http://exist-db.org/xquery/lucene";
declare namespace map="http://www.w3.org/2005/xpath-functions/map";
declare namespace output="http://www.w3.org/2010/xslt-xquery-serialization";
declare namespace xmldb="http://exist-db.org/xquery/xmldb";
(: VARIABLES :)
declare variable $t:XML :=
<div>
<test>Adm. 1,10</test>
<test>Bdm. 1,11</test>
<test>Cdm. 1,12</test>
<test>Edm. 1,1</test>
<test>Fdm. 1,2</test>
<test>Gdm. 1,3</test>
<test>Zdm. 1,4</test>
<test>Wdm. 1,5</test>
<test>Odm. 1,6</test>
<test>Ydm. 1,7</test>
<test>Cdm. 1,8</test>
<test>Vdm. 1,9</test>
<test>Pdm. 1,13</test>
<test>Edm. 1,14</test>
</div>;
declare variable $t:xconf :=
<collection xmlns="http://exist-db.org/collection-config/1.0">
<index xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- Full-text indexing with Lucene -->
<lucene>
<!-- Elements upon which to build an index. -->
<text qname="test">
<field name="sortable" expression="./string()" type="xs:string" binary="yes"/>
</text>
</lucene>
</index>
</collection>;
(: FUNCTIONS :)
declare
%test:setUp
function t:setup() {
let $testCol := xmldb:create-collection("/db", "test")
let $indexCol := xmldb:create-collection("/db/system/config/db", "test")
return (
xmldb:store("/db/test", "test.xml", $t:XML),
xmldb:store("/db/system/config/db/test", "collection.xconf", $t:xconf),
xmldb:reindex("/db/test")
)
};
declare
%test:tearDown
function t:tearDown() {
xmldb:remove("/db/test"),
xmldb:remove("/db/system/config/db/test")
};
declare
%test:name('Sorted result.')
%test:assertExists
%test:assertXPath('count(doc("/db/test/test.xml")//test) eq count($result)')
%test:assertError("err:XPTY0004")
function t:sorted-result() as xs:string* {
let $options := map {
'fields': ('sortable')
}
let $index := doc("/db/test/test.xml")/div[ft:query(., (), $options)]
return
(
let $values := ft:binary-field($index, "sortable","xs:string")
where count($values gt 0 )
for $field in $values
order by $field ascending
return
(
$field
)
)
};
Without binary:
xquery version "3.1";
module namespace t="http://exist-db.org/xquery/test";
(: LIBRARIES :)
declare namespace test="http://exist-db.org/xquery/xqsuite";
(: NAMESPACES :)
declare namespace array="http://www.w3.org/2005/xpath-functions/array";
declare namespace exist="http://exist.sourceforge.net/NS/exist";
declare namespace ft="http://exist-db.org/xquery/lucene";
declare namespace map="http://www.w3.org/2005/xpath-functions/map";
declare namespace output="http://www.w3.org/2010/xslt-xquery-serialization";
declare namespace xmldb="http://exist-db.org/xquery/xmldb";
(: VARIABLES :)
declare variable $t:XML :=
<div>
<test>Adm. 1,10</test>
<test>Bdm. 1,11</test>
<test>Cdm. 1,12</test>
<test>Edm. 1,1</test>
<test>Fdm. 1,2</test>
<test>Gdm. 1,3</test>
<test>Zdm. 1,4</test>
<test>Wdm. 1,5</test>
<test>Odm. 1,6</test>
<test>Ydm. 1,7</test>
<test>Cdm. 1,8</test>
<test>Vdm. 1,9</test>
<test>Pdm. 1,13</test>
<test>Edm. 1,14</test>
</div>;
declare variable $t:xconf :=
<collection xmlns="http://exist-db.org/collection-config/1.0">
<index xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- Full-text indexing with Lucene -->
<lucene>
<!-- Elements upon which to build an index. -->
<text qname="div">
<field name="sortable" expression="./test/string()"/>
</text>
</lucene>
</index>
</collection>;
(: FUNCTIONS :)
declare
%test:setUp
function t:setup() {
let $testCol := xmldb:create-collection("/db", "test")
let $indexCol := xmldb:create-collection("/db/system/config/db", "test")
return (
xmldb:store("/db/test", "test.xml", $t:XML),
xmldb:store("/db/system/config/db/test", "collection.xconf", $t:xconf),
xmldb:reindex("/db/test")
)
};
declare
%test:tearDown
function t:tearDown() {
xmldb:remove("/db/test"),
xmldb:remove("/db/system/config/db/test")
};
declare
%test:name('Sorted result.')
%test:assertExists
%test:assertXPath('count(doc("/db/test/test.xml")//test) eq count($result)')
%test:assertError("err:XPTY0004")
function t:sorted-result() as xs:string* {
let $options := map {
'fields': ('sortable')
}
let $index := doc("/db/test/test.xml")/div[ft:query(., (), $options)]
return
(
let $values := ft:field($index, "sortable","xs:string")
where count($values gt 0 )
for $field in $values
order by $field ascending
return
(
$field
)
)
};
Context (please always complete the following information)
- Build: eXist-6.2.0
- Java: 1.8.0_422
- OS: Ubuntu 22.04.4 LTS - Linux 6.8.0-40-generic amd64
Additional context
- How is eXist-db installed? JAR installer
@scheidelerl Thank you for this complete issue report. I would like to know with which version of exist-db the above test-suite passes.
In order to read values of binary fields a new function was added ft:binary-field and I cannot see you using it. Maybe that is the issue?
see also https://exist-db.org/exist/apps/doc/lucene#retrieve-fields "Retrieving Field Content"
Hey,
thank you for the reply.
The eXist-db Version is the current build 6.2.0. Installed with the JAR Installer.
If I use ft:binary-field($index, ‘sortable’, ‘xs:string’) instead, which I think should be the intended way of using it, it doesn't work either.
In eXide the attribute binary show this linter error : [cvc-complex-type.3.2.2: Attribute 'binary' is not allowed to appear in element 'field']
The eXist log file does not say anything about it.
If I use the field without binary no problem occur.
If I try to apply the collection.xconf with eXide this error occurs: Failed to apply configuration: DocValuesField "sortable" appears more than once in this document (only one value is allowed per field)
If I use only doc("/db/test/test.xml")/div[ft:query(., ())] with the binary attribute in the field child, the result is empty, and ft:field($node as node(), $field as xs:string) and ft:binary-field($node as node(), $field as xs:string, $type as xs:string) throw this errors:
err:XPTY0004 It is a type error if, during the static analysis phase, an expression is found to have a static type that is not appropriate for the context in which the expression occurs, or during the dynamic evaluation phase, the dynamic type of a value does not match a required type as specified by the matching rules in 2.5.4 SequenceType Matching. checking function parameter 1 in call ft:field($index, "sortable"): XPTY0004: The actual cardinality for parameter 1 does not match the cardinality declared in the function's signature: ft:field($node as node(), $field as xs:string) item()*. Expected cardinality: exactly one, got 0.err:XPTY0004 It is a type error if, during the static analysis phase, an expression is found to have a static type that is not appropriate for the context in which the expression occurs, or during the dynamic evaluation phase, the dynamic type of a value does not match a required type as specified by the matching rules in 2.5.4 SequenceType Matching. checking function parameter 1 in call ft:binary-field($index, "sortable", "xs:string"): XPTY0004: The actual cardinality for parameter 1 does not match the cardinality declared in the function's signature: ft:binary-field($node as node(), $field as xs:string, $type as xs:string) item()*. Expected cardinality: exactly one, got 0.
And I know what it means, it's self-explanatory. But it means that it does not perform the full index because an error occurs. Which is not listed in the log or otherwise and is ultimately related to the attribute, because it works without it.
@scheidelerl you need to have some hits in order to sort them using the binary field values. I suspect that your call to ft:query returns an empty sequence. Can you check that?
In eXide the attribute binary show this linter error : [cvc-complex-type.3.2.2: Attribute 'binary' is not allowed to appear in element 'field']
It can very well be, that the schema was not updated to add the binary attribute.
I updated the test above and added a new one, one with and one without binary.
It works when I use the element directly, because binary seems to need a single value.
The hint was the error with the index apply in eXide.
This brings me to the following questions:
- Why is this not the case for normal fields, so that the behaviour is adaptable when I realize that I don't need to query certain values?
- Why is there no reference to this in the documentation and please don't tell me that it is sufficiently explained because the default value is specified as
xs:string? - Why the log does not show this as an error when I apply the index?
- Why I cannot declare
type="xs:string*"to prevent this error? - Why this works in 5.4.0?
!!!! → 6. What do I have to do if I only want to perform a query above the parent level and have several values in one field, but want to have faster access?
Why this works in 5.4.0?
As far as I know binary fields were added in version 6.2.0. That means it cannot work in version 5.4.0.
Why this works in 5.4.0?
As far as I know binary fields were added in version 6.2.0. That means it cannot work in version 5.4.0.
It is not what I meant.
I think that the field behaviour in eXist-db 5.4.0 handle sequences or sub-sequencing differently and the binary parameter also worked without ft:binary-field or ignores the binary parameter without telling.
I only noticed the issue of different handling because we migrated from 5.4.0 to 6.2.0.
I shouldn't be a problem, but I searched the development logs and only the added function ft:binary-field was mentioned, but I couldn't find the albeit minor but important differences in the handling of the fields or facets or indexes.
And maybe migration should include in the documentation, if people want to switch versions, which major version handles which content and how, and which functions are available.