exist icon indicating copy to clipboard operation
exist copied to clipboard

[BUG] Register binary attribute in collection.xconf.xsd

Open joewiz opened this issue 1 year ago • 3 comments

Describe the bug

PR #4541 added support for indexing fields as binary values, controlled by a new @binary attribute on the <field> element. The PR neglected to register this new attribute in collection.xconf.xsd.

As a result, eXide, for example, flags uses of this attribute. See https://github.com/eXist-db/exist/issues/5431#issuecomment-2317631397.

Expected behavior

The xconf schema should define the attribute and its allowed values.

To Reproduce

Validate this document from the linked issue against collection.xconf.xsd:

<collection xmlns="http://exist-db.org/collection-config/1.0">
  <index xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <!-- Full-text indexing with Lucene -->
    <lucene>
      <!-- Elements upon which to build an index. -->
      <text qname="div">
        <field name="sortable" expression="./test/string()" type="xs:string" binary="yes"/>
      </text>
    </lucene>
  </index>
</collection>

Via the linked issue:

If I try to apply the collection.xconf with eXide this error occurs:

Failed to apply configuration: DocValuesField "sortable" appears more than once in this document (only one value is allowed per field)

Screenshots

n/a

Context (please always complete the following information)

  • Build: eXist-6.2.0
  • Java: 1.8.0_422
  • OS: Ubuntu 22.04.4 LTS - Linux 6.8.0-40-generic amd64

Additional context

  • How is eXist-db installed? n/a
  • Any custom changes in e.g. conf.xml? n/a

joewiz avatar Aug 29 '24 13:08 joewiz

I'd be happy to submit a PR adding field/@binary to https://github.com/eXist-db/exist/blob/develop/schema/collection.xconf.xsd#L179-L186.

I just need to know the allowed values.

The example code on the Lucene documentation page shows binary="yes", but the tests added in the PR all say binary="true". I take it from https://github.com/eXist-db/exist/pull/4541/files#diff-bdeb3fc906efd9177af97cd1351231dc0e9a9aa14e477246b82b57483d6618e2R113 that either "true" or "yes" are accepted. It seems that any other value is treated as false (same as omitting the attribute).

(The other attributes in this schema that take yes - e.g., caseOpt - don't offer true|false as options. So the binary attribute appears to be an outlier in accepting this larger set of values in this schema.)

If that's a correct interpretation of the code, I'd add a binaryOpt entry to fieldAttrType, defined as follows:

<xs:attributeGroup name="binaryOpt">
    <xs:attribute name="binary" use="optional">
        <xs:simpleType>
            <xs:restriction base="xs:token">
                <xs:enumeration value="yes">
                    <xs:annotation>
                        <xs:documentation>Index as a binary field</xs:documentation>
                    </xs:annotation>
                </xs:enumeration>
                <xs:enumeration value="true">
                    <xs:annotation>
                        <xs:documentation>Index as a binary field</xs:documentation>
                    </xs:annotation>
                </xs:enumeration>
                <xs:enumeration value="no">
                    <xs:annotation>
                        <xs:documentation>Do not index as a binary field</xs:documentation>
                    </xs:annotation>
                </xs:enumeration>
                <xs:enumeration value="false">
                    <xs:annotation>
                        <xs:documentation>Do not index as a binary field</xs:documentation>
                    </xs:annotation>
                </xs:enumeration>
            </xs:restriction>
        </xs:simpleType>
    </xs:attribute>
</xs:attributeGroup>

Is this a correct interpretation of what the code supports / implies?

joewiz avatar Aug 29 '24 13:08 joewiz

As I read the code, you are correct. @adamretter

dizzzz avatar Aug 30 '24 09:08 dizzzz

@dizzzz Thanks!

joewiz avatar Aug 30 '24 14:08 joewiz