[Feature Request] Support index sorting for indices with nested fields
Is your feature request related to a problem? Please describe
Currently index sorting and nested fields are mutually exclusive features (validated by DocumentMapper). However it should be possible to allow them to be used together under certain constraints.
Describe the solution you'd like
Indexing sorting should be possible if the fields used for sorting are not on the nested document and are instead only on the parent document. This would allow for nested documents to have subsequent doc IDs immediately after the parent document
Related component
Indexing
Describe alternatives you've considered
No response
Additional context
Its not clear to me if simply tweaking the validation logic would be sufficient or if there are additional changes required when applying the index sort.
+1 for this
Elasticsearch implemented it last year: https://github.com/elastic/elasticsearch/pull/110251
+1
This will speed up the response times of default searches in our search indexes with nested fields considerably. Our default sort expression uses 4 fields at the root and having the data presorted will be a great performance improvement.
One more detail that makes this feature even more important for us. We have an analyzed text field in our index that uses 20% of the total size of the index. Making queries to the index, even when we do not retrieve or filter by the analyzed fields is 20% slower just by having that field in the index, this impact in the query response time goes away if we define the field as nested. By defining it as nested we cannot use the index sort feature.
Lucene introduced https://github.com/apache/lucene/pull/12829 which should allow us to build the support for sorting of nested documents.
Working on this issue.
Support for Index Sort in Document Blocks (Nested Fields)
Introduction:
Lucene has introduced support for Index Sort with Nested fields through this change[1]. This implementation adds a parentField parameter to IndexWriterConfig, which creates an internal field for every root Document. In this system, single documents are treated as parents, while in document blocks, the last document is designated as the parent. This enhancement enables sorting indexes based on field value, where same sort value (ties) between documents with identical sort values are resolved using the DocIds.
Current State:
OpenSearch currently implements a validator at the shard level that prevents users from performing Index Sort operations on Nested Fields by throwing an error during document mapping creation[2].
Proposed Solution:
Since Lucene now supports IndexWriterConfig#setParentField [3] for creating an internal field to enable index sorting, OpenSearch needs to integrate this functionality by setting the Parent Field in appropriate workflows with proper validations.
Required Validations:
- Parent Field should only be set when Index Sort is configured.
- To handle backward compatibility, we need a validator since this introduces a new Internal Reserved Field (ParentField). This validator will ensures the feature is only enabled for new index versions, preventing conflicts with existing fields in older indices.
Workflows Requiring Changes:
-
Index Shard/ Engine Creation:
During the creation of Lucene engine, we need to update IndexWriterConfig to include the parent field setting.[4]
-
Recovery Flow:
When recovering an index shard from the local Shards and from remote shard , the Parent Field setting needs to be properly configured in the recovery workflow[5]. This will also requires changes in the remote store creation flow where we create seperate IndexWriter. These updates are necessary as the recovery flow extracts snapshots from this store[6].
-
Updating Document Mapper Validator:
Updating the DocumentMapper Validation to only apply to older index, as newer indices will support Parent Field.[2]
References:
[1]→ https://github.com/apache/lucene/pull/12829
[2]→ https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/mapper/DocumentMapper.java#L331
[3]→ https://github.com/apache/lucene/pull/12829/files#diff-0c2b93acc3fb0d7903df65eb5d5e381d46cf7e3e8ed26f09baeedca6c6400e16R550-R563
[4]→ https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/engine/InternalEngine.java#L2384
[5]→ https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/shard/StoreRecovery.java#L241
[6]→ https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/store/Store.java#L1948