elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

Add `docvalue_fields` Support for `dense_vector` Fields

Open Rassyan opened this issue 1 year ago • 6 comments

Summary

This PR implements support for the docvalue_fields parameter for dense_vector fields, as described in issue #108470.

Details

  • Single Value Constraint:

    • Each document's dense_vector field contains a single vector value (multi-valued vectors are not allowed). The vector value is an array of numeric types. The docvalue_fields response will return this array directly as the value of fields.vector, without additional nesting.
  • Data Type Handling:

    • For byte type dense_vector, the docvalue_fields response will be identical to the _source response.
    • For float type dense_vector, similar to other floating-point docvalue_fields, there may be minor precision differences between the docvalue_fields and _source responses. However, these differences are negligible and do not affect vector comparisons.
  • Implementation:

    • Introduced a new DenseDocValueFormat class to handle dense_vector docvalue_fields. This class does not require actual formatting.
    • Overridden the VectorDVLeafFieldData#getFormattedValues method to return FormattedDocValues for byte, bit, and float type dense_vector fields.
  • Testing:

    • Added YAML tests to cover the docvalue_fields parameter for dense_vector fields.

Related Issues

Closes https://github.com/elastic/elasticsearch/issues/108470

Rassyan avatar Oct 10 '24 08:10 Rassyan

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elasticsearchmachine avatar Oct 10 '24 11:10 elasticsearchmachine

Hi team,

I've implemented the support for the docvalue_fields parameter for dense_vector fields as discussed in https://github.com/elastic/elasticsearch/issues/108470. I would greatly appreciate it if you could take a moment to review the changes.

cc: @mayya-sharipova, @benwtrent – your insights and feedback would be invaluable.

Thank you very much for your time and assistance!

Rassyan avatar Oct 10 '24 12:10 Rassyan

Thank you, @mayya-sharipova , for your detailed feedback. I have made the following adjustments based on your suggestions:

  1. Updated assertions to use close_to instead of lt and gt.
  2. Removed DOCVALUES_FIELDS_SUPPORTED NodeFeature.
  3. Renamed the class to DenseVectorDocValueFormat.
  4. Renamed the constant to DENSE_VECTOR.
  5. Adjusted the code to access vectors through an iterator as per the provided example.
  6. Simplified the implementation by removing the two abstract classes DenseVectorNumericByteValues and DenseVectorNumericFloatValues, and introduced concrete classes in VectorDVLeafFieldData.
  7. Fixed the issue of instantiating vector values on every iteration of docID. Now, they are instantiated only once and reused.

Rassyan avatar Oct 23 '24 09:10 Rassyan

@mayya-sharipova, thank you for your comments. I've already incorporated some of your suggestions and am looking forward to more of your feedback.

Rassyan avatar Oct 24 '24 04:10 Rassyan

@Rassyan I think the solution I provided in the last commit should work.

Let's make the test more advanced, and I think this PR will be ready after that.

mayya-sharipova avatar Oct 30 '24 15:10 mayya-sharipova

@elasticmachine test this please

mayya-sharipova avatar Oct 31 '24 13:10 mayya-sharipova

@Rassyan I started CI, and some tests are failing:

REPRODUCE WITH: ./gradlew ":server:test" --tests "org.elasticsearch.index.mapper.vectors.DenseVectorFieldTypeTests.testDocValueFormat" -Dtests.seed=5E3E0C56725338D7 -Dtests.locale=ebu-KE -Dtests.timezone=Australia/West -Druntime.java=22
  |  
  | DenseVectorFieldTypeTests > testDocValueFormat FAILED
  | junit.framework.AssertionFailedError: Expected exception IllegalArgumentException but no exception was thrown
  | at __randomizedtesting.SeedInfo.seed([5E3E0C56725338D7:ACD632D4793932D0]:0)
  | at org.apache.lucene.tests.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2889)
  | at org.apache.lucene.tests.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2875)
  | at org.elasticsearch.index.mapper.vectors.DenseVectorFieldTypeTests.testDocValueFormat(DenseVectorFieldTypeTests.java:137)
 

<br class="Apple-interchange-newline">

Please fix them.

mayya-sharipova avatar Oct 31 '24 14:10 mayya-sharipova

@elasticsearchmachine test this please

mayya-sharipova avatar Nov 01 '24 14:11 mayya-sharipova

@elasticmachine update branch

mayya-sharipova avatar Nov 01 '24 17:11 mayya-sharipova

@elasticmachine test this please

mayya-sharipova avatar Nov 01 '24 18:11 mayya-sharipova

@elasticmachine test this please

mayya-sharipova avatar Nov 04 '24 16:11 mayya-sharipova

@elasticmachine test this please

mayya-sharipova avatar Nov 06 '24 16:11 mayya-sharipova

@elasticmachine test this please

mayya-sharipova avatar Nov 06 '24 19:11 mayya-sharipova

@elasticmachine run elasticsearch-ci/part-4

mayya-sharipova avatar Nov 06 '24 19:11 mayya-sharipova

@elasticmachine run "elasticsearch-ci/windows-2019 / default-windows-archive / packaging-tests-windows-sample"

mayya-sharipova avatar Nov 06 '24 21:11 mayya-sharipova

@elasticmachine update branch

mayya-sharipova avatar Nov 07 '24 01:11 mayya-sharipova

@elasticmachine test this please

mayya-sharipova avatar Nov 07 '24 11:11 mayya-sharipova

@elasticmachine run elasticsearch-ci/bwc-snapshots

mayya-sharipova avatar Nov 07 '24 14:11 mayya-sharipova

@elasticmachine test this please

mayya-sharipova avatar Nov 07 '24 17:11 mayya-sharipova

@elasticmachine run "Elasticsearch Serverless Checks"

mayya-sharipova avatar Nov 07 '24 20:11 mayya-sharipova

@elasticmachine test this please

mayya-sharipova avatar Nov 07 '24 21:11 mayya-sharipova

@Rassyan Thank you for your work and congratulations, this has been merged for 8.17 release.

mayya-sharipova avatar Nov 07 '24 22:11 mayya-sharipova

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

benwtrent avatar Nov 08 '24 14:11 benwtrent