OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[Feature Request] Support index sorting for indices with nested fields

Open jhinch-at-atlassian-com opened this issue 11 months ago • 6 comments

Is your feature request related to a problem? Please describe

Currently index sorting and nested fields are mutually exclusive features (validated by DocumentMapper). However it should be possible to allow them to be used together under certain constraints.

Describe the solution you'd like

Indexing sorting should be possible if the fields used for sorting are not on the nested document and are instead only on the parent document. This would allow for nested documents to have subsequent doc IDs immediately after the parent document

Related component

Indexing

Describe alternatives you've considered

No response

Additional context

Its not clear to me if simply tweaking the validation logic would be sufficient or if there are additional changes required when applying the index sort.

jhinch-at-atlassian-com avatar Jan 15 '25 00:01 jhinch-at-atlassian-com

Catch All Triage - 1, 2, 3, 4, 5

andrross avatar Feb 03 '25 17:02 andrross

+1 for this

Elasticsearch implemented it last year: https://github.com/elastic/elasticsearch/pull/110251

michelemottini avatar May 17 '25 13:05 michelemottini

+1

This will speed up the response times of default searches in our search indexes with nested fields considerably. Our default sort expression uses 4 fields at the root and having the data presorted will be a great performance improvement.

One more detail that makes this feature even more important for us. We have an analyzed text field in our index that uses 20% of the total size of the index. Making queries to the index, even when we do not retrieve or filter by the analyzed fields is 20% slower just by having that field in the index, this impact in the query response time goes away if we define the field as nested. By defining it as nested we cannot use the index sort feature.

homerogon avatar May 28 '25 19:05 homerogon

Lucene introduced https://github.com/apache/lucene/pull/12829 which should allow us to build the support for sorting of nested documents.

mgodwan avatar Jun 05 '25 08:06 mgodwan

Working on this issue.

vishdivs avatar Jun 11 '25 18:06 vishdivs

Support for Index Sort in Document Blocks (Nested Fields)

Introduction:

Lucene has introduced support for Index Sort with Nested fields through this change[1]. This implementation adds a parentField parameter to IndexWriterConfig, which creates an internal field for every root Document. In this system, single documents are treated as parents, while in document blocks, the last document is designated as the parent. This enhancement enables sorting indexes based on field value, where same sort value (ties) between documents with identical sort values are resolved using the DocIds.

Current State:

OpenSearch currently implements a validator at the shard level that prevents users from performing Index Sort operations on Nested Fields by throwing an error during document mapping creation[2].

Proposed Solution:

Since Lucene now supports IndexWriterConfig#setParentField [3] for creating an internal field to enable index sorting, OpenSearch needs to integrate this functionality by setting the Parent Field in appropriate workflows with proper validations.

Required Validations:

  1. Parent Field should only be set when Index Sort is configured.
  2. To handle backward compatibility, we need a validator since this introduces a new Internal Reserved Field (ParentField). This validator will ensures the feature is only enabled for new index versions, preventing conflicts with existing fields in older indices.

Workflows Requiring Changes:

  1. Index Shard/ Engine Creation:

    During the creation of Lucene engine, we need to update IndexWriterConfig to include the parent field setting.[4]

  2. Recovery Flow:

    When recovering an index shard from the local Shards and from remote shard , the Parent Field setting needs to be properly configured in the recovery workflow[5]. This will also requires changes in the remote store creation flow where we create seperate IndexWriter. These updates are necessary as the recovery flow extracts snapshots from this store[6].

  3. Updating Document Mapper Validator:

    Updating the DocumentMapper Validation to only apply to older index, as newer indices will support Parent Field.[2]

References:

[1]→ https://github.com/apache/lucene/pull/12829

[2]→ https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/mapper/DocumentMapper.java#L331

[3]→ https://github.com/apache/lucene/pull/12829/files#diff-0c2b93acc3fb0d7903df65eb5d5e381d46cf7e3e8ed26f09baeedca6c6400e16R550-R563

[4]→ https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/engine/InternalEngine.java#L2384

[5]→ https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/shard/StoreRecovery.java#L241

[6]→ https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/store/Store.java#L1948

vishdivs avatar Jun 12 '25 11:06 vishdivs