OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Adds search implementation of context aware segments

Open shatejas opened this issue 3 months ago โ€ข 9 comments

The implementation aims at pruning search space based on context awareness when context-aware-grouping mapper is presents. It does a best attempt extraction of grouping criteria from the query, if no grouping criteria is found it will search all segements

Dependent on Indexing PR (CIs will fail)

  • https://github.com/opensearch-project/OpenSearch/pulls/RS146BIJAY

Related Issues

Resolves #[19093]

Check List

  • [ ] Functionality includes testing.
  • [ ] API changes companion pull request created, if applicable.
  • [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced context-aware search filtering capabilities that enable segment-level query optimizations based on context criteria.
    • Added automatic criteria extraction from queries to support context-aware filtering without manual configuration.
    • Enhanced script validation to support stored scripts in context-aware grouping configurations.
  • Tests

    • Expanded test coverage for criteria-based filtering and context-aware query extraction functionality.

โœ๏ธ Tip: You can customize this high-level summary in your review settings.

shatejas avatar Oct 07 '25 22:10 shatejas

:x: Gradle check result for 0ece83d16d4c826e962e8120847aa63226f92246: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 07 '25 22:10 github-actions[bot]

:x: Gradle check result for e5db2d910a2881b358ff4ddbe91646c5a102c9b0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 11 '25 00:10 github-actions[bot]

:x: Gradle check result for 9b887d8414f51ce4181aa15fac34eddab3441d10: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 11 '25 00:10 github-actions[bot]

:x: Gradle check result for b6eb84e59cac8db2af1efa619762a90c1d587998: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 20 '25 23:10 github-actions[bot]

:x: Gradle check result for 74ac3dcd919260317c6377fbe5f31d75aa1e217e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 21 '25 00:10 github-actions[bot]

:x: Gradle check result for ab40f3115a1476300d7e1c82a500263817b9b864: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 22 '25 20:10 github-actions[bot]

:x: Gradle check result for 71b42b41156122c184138510313416747f96d08b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 23 '25 23:10 github-actions[bot]

:x: Gradle check result for 2b4cb4c3b9d842c1bc76217ebf85e74a52bf31f6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Nov 25 '25 16:11 github-actions[bot]

Walkthrough

This pull request introduces context-aware criteria-based segment filtering infrastructure for OpenSearch. It adds APIs for extracting grouping criteria from queries, filtering directory readers by criteria, propagating criteria through the search stack (Engine, IndexShard, SearchService), and new query builder interfaces to support segment-level optimization.

Changes

Cohort / File(s) Summary
Core Directory Reader Context Support
server/src/main/java/org/opensearch/common/lucene/index/OpenSearchDirectoryReader.java
Adds contextAwareReadersLeafReaderMap and methods to build/retrieve criteria-based filtered readers; introduces ChildDirectoryReader inner class with cache key isolation; extends DelegatingCacheHelper and DelegatingCacheKey constructors for per-criteria cache management
Searcher Acquisition API Extensions
server/src/main/java/org/opensearch/index/engine/Engine.java, server/src/main/java/org/opensearch/index/shard/IndexShard.java
Extends acquireSearcherSupplier and acquireSearcher methods with optional context-aware grouping criteria parameters; propagates criteria through searcher acquisition calls to enable segment-level filtering
Context-Aware Criteria Extraction
server/src/main/java/org/opensearch/search/contextaware/ContextAwareCriteriaQueryExtraction.java, server/src/main/java/org/opensearch/search/contextaware/package-info.java
New class to analyze query builders and extract context-aware criteria via recursive query traversal; handles TermQuery, TermsQuery, WithFilterQueryBuilder, and BoolQuery with precedence logic; supports script-based transformations
Query Builder and Field Mapper Enhancements
server/src/main/java/org/opensearch/index/query/WithFilterQueryBuilder.java, server/src/main/java/org/opensearch/index/mapper/ContextAwareGroupingFieldMapper.java
Adds WithFilterQueryBuilder interface for filter component access; relaxes ContextAwareGroupingFieldMapper script validation to allow STORED scripts regardless of language
Search Service Integration
server/src/main/java/org/opensearch/search/SearchService.java
Imports and integrates ContextAwareCriteriaQueryExtraction; adds getContextAwareGroupingCriteria helper to conditionally extract criteria when enabled; passes criteria to searcher acquisition throughout search execution
Test Infrastructure
test/framework/src/main/java/org/opensearch/index/MapperTestUtils.java, test/framework/src/main/java/org/opensearch/script/MockScriptEngine.java
Extends MapperTestUtils with ScriptService parameter propagation; updates MockScriptEngine to handle ContextAwareGroupingScript return value conversion via String.valueOf()
Comprehensive Test Coverage
server/src/test/java/org/opensearch/common/lucene/index/OpenSearchDirectoryReaderTests.java, server/src/test/java/org/opensearch/search/contextaware/ContextAwareCriteriaQueryExtractionTests.java
Adds testCriteriaBasedReaders covering segment filtering and caching; introduces ContextAwareCriteriaQueryExtractionTests with extensive coverage of query analysis, script execution, and boolean query precedence

Sequence Diagram

sequenceDiagram
    participant Client
    participant SearchService
    participant ContextAwareExtraction
    participant Engine
    participant IndexShard
    participant DirectoryReader

    Client->>SearchService: execute search with query
    activate SearchService
    SearchService->>ContextAwareExtraction: extractCriteria(query)
    activate ContextAwareExtraction
    ContextAwareExtraction->>ContextAwareExtraction: traverse query tree<br/>(TermQuery, TermsQuery,<br/>BoolQuery, WithFilterQueryBuilder)
    ContextAwareExtraction-->>SearchService: return Set<String> criteria
    deactivate ContextAwareExtraction
    
    SearchService->>IndexShard: acquireSearcherSupplier(..., criteria)
    activate IndexShard
    IndexShard->>Engine: acquireSearcherSupplier(..., criteria)
    activate Engine
    alt criteria present and non-empty
        Engine->>DirectoryReader: getCriteriaBasedReader(criteria)
        activate DirectoryReader
        DirectoryReader->>DirectoryReader: warmUpCriteriaBasedReader()<br/>scan segments for bucket metadata
        DirectoryReader->>DirectoryReader: createChildDirectoryReader()<br/>filter segments matching criteria
        DirectoryReader-->>Engine: filtered DirectoryReader
        deactivate DirectoryReader
    else no criteria
        Engine->>DirectoryReader: standard acquire()
        DirectoryReader-->>Engine: standard DirectoryReader
    end
    Engine-->>IndexShard: SearcherSupplier
    deactivate Engine
    IndexShard-->>SearchService: SearcherSupplier
    deactivate IndexShard
    SearchService->>SearchService: execute search with filtered reader
    deactivate SearchService
    SearchService-->>Client: search results

Estimated code review effort

๐ŸŽฏ 4 (Complex) | โฑ๏ธ ~60 minutes

  • ContextAwareCriteriaQueryExtraction: Recursive query analysis with multiple edge cases, precedence rules, and script execution logic across query types (TermQuery, TermsQuery, BoolQuery, WithFilterQueryBuilder, custom queries)
  • OpenSearchDirectoryReader cache key management: New per-criteria cache isolation via DelegatingCacheHelper and DelegatingCacheKey constructors; ChildDirectoryReader lifecycle and cache helper override semantics
  • Cross-layer API propagation: Verify criteria parameter flows correctly through Engine โ†’ IndexShard โ†’ SearchService and that null-handling is consistent
  • Boolean query clause precedence: Ensure filter > must > should precedence is correctly implemented and tested in recursive boolean scenarios
  • Script execution and field resolution: ContextAwareGroupingScript invocation, NumberFieldMapper handling, and script-to-string conversion in MockScriptEngine

Suggested labels

Indexing

Suggested reviewers

  • msfroh
  • cwperks
  • reta

Poem

๐Ÿฐ A tale of segments sorted right,
By context criteria shining bright,
Filters extract what queries yearn,
And readers learn where buckets turn,
Scripts transform each field's delight! โœจ

Pre-merge checks and finishing touches

โŒ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage โš ๏ธ Warning Docstring coverage is 25.26% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
โœ… Passed checks (2 passed)
Check name Status Explanation
Title check โœ… Passed The title 'Adds search implementation of context aware segments' clearly describes the main objective of the PR: implementing search-time functionality for context-aware segment pruning.
Description check โœ… Passed The PR description explains the implementation's purpose and notes a dependency on an indexing PR. However, the description lacks completion of required checklist items and provides minimal technical detail about changes.
โœจ Finishing touches
  • [ ] ๐Ÿ“ Generate docstrings
๐Ÿงช Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Dec 07 '25 07:12 coderabbitai[bot]