Add `docvalue_fields` Support for `dense_vector` Fields
Summary
This PR implements support for the docvalue_fields parameter for dense_vector fields, as described in issue #108470.
Details
-
Single Value Constraint:
- Each document's
dense_vectorfield contains a single vector value (multi-valued vectors are not allowed). The vector value is an array of numeric types. Thedocvalue_fieldsresponse will return this array directly as the value offields.vector, without additional nesting.
- Each document's
-
Data Type Handling:
- For
bytetypedense_vector, thedocvalue_fieldsresponse will be identical to the_sourceresponse. - For
floattypedense_vector, similar to other floating-pointdocvalue_fields, there may be minor precision differences between thedocvalue_fieldsand_sourceresponses. However, these differences are negligible and do not affect vector comparisons.
- For
-
Implementation:
- Introduced a new
DenseDocValueFormatclass to handledense_vectordocvalue_fields. This class does not require actual formatting. - Overridden the
VectorDVLeafFieldData#getFormattedValuesmethod to returnFormattedDocValuesforbyte,bit, andfloattypedense_vectorfields.
- Introduced a new
-
Testing:
- Added YAML tests to cover the
docvalue_fieldsparameter fordense_vectorfields.
- Added YAML tests to cover the
Related Issues
Closes https://github.com/elastic/elasticsearch/issues/108470
Pinging @elastic/es-search-relevance (Team:Search Relevance)
Hi team,
I've implemented the support for the docvalue_fields parameter for dense_vector fields as discussed in https://github.com/elastic/elasticsearch/issues/108470. I would greatly appreciate it if you could take a moment to review the changes.
cc: @mayya-sharipova, @benwtrent – your insights and feedback would be invaluable.
Thank you very much for your time and assistance!
Thank you, @mayya-sharipova , for your detailed feedback. I have made the following adjustments based on your suggestions:
- Updated assertions to use
close_toinstead ofltandgt. - Removed
DOCVALUES_FIELDS_SUPPORTEDNodeFeature. - Renamed the class to
DenseVectorDocValueFormat. - Renamed the constant to
DENSE_VECTOR. - Adjusted the code to access vectors through an iterator as per the provided example.
- Simplified the implementation by removing the two abstract classes
DenseVectorNumericByteValuesandDenseVectorNumericFloatValues, and introduced concrete classes inVectorDVLeafFieldData. - Fixed the issue of instantiating vector values on every iteration of
docID. Now, they are instantiated only once and reused.
@mayya-sharipova, thank you for your comments. I've already incorporated some of your suggestions and am looking forward to more of your feedback.
@Rassyan I think the solution I provided in the last commit should work.
Let's make the test more advanced, and I think this PR will be ready after that.
@elasticmachine test this please
@Rassyan I started CI, and some tests are failing:
REPRODUCE WITH: ./gradlew ":server:test" --tests "org.elasticsearch.index.mapper.vectors.DenseVectorFieldTypeTests.testDocValueFormat" -Dtests.seed=5E3E0C56725338D7 -Dtests.locale=ebu-KE -Dtests.timezone=Australia/West -Druntime.java=22
|
| DenseVectorFieldTypeTests > testDocValueFormat FAILED
| junit.framework.AssertionFailedError: Expected exception IllegalArgumentException but no exception was thrown
| at __randomizedtesting.SeedInfo.seed([5E3E0C56725338D7:ACD632D4793932D0]:0)
| at org.apache.lucene.tests.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2889)
| at org.apache.lucene.tests.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2875)
| at org.elasticsearch.index.mapper.vectors.DenseVectorFieldTypeTests.testDocValueFormat(DenseVectorFieldTypeTests.java:137)
<br class="Apple-interchange-newline">
Please fix them.
@elasticsearchmachine test this please
@elasticmachine update branch
@elasticmachine test this please
@elasticmachine test this please
@elasticmachine test this please
@elasticmachine test this please
@elasticmachine run elasticsearch-ci/part-4
@elasticmachine run "elasticsearch-ci/windows-2019 / default-windows-archive / packaging-tests-windows-sample"
@elasticmachine update branch
@elasticmachine test this please
@elasticmachine run elasticsearch-ci/bwc-snapshots
@elasticmachine test this please
@elasticmachine run "Elasticsearch Serverless Checks"
@elasticmachine test this please
@Rassyan Thank you for your work and congratulations, this has been merged for 8.17 release.
💚 All backports created successfully
| Status | Branch | Result |
|---|---|---|
| ✅ | 8.x |
Questions ?
Please refer to the Backport tool documentation