OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Support ignore_above for keyword/wildcard field and optimise text field under derived source

Open tanik98 opened this issue 1 month ago β€’ 9 comments

  • For keyword and wildcard field handle ignore_above by explicitly storing values exceeding that and using stored field for using it as a fallback
  • For text field, use sub keyword fields for deriving source(using doc_values) instead of explicitly storing the value, this way we can avoid storing duplicates(as a doc_values under sub keyword and stored field under text field).
  • In case of multiple keyword fields, we will be going through each one to find first one, and in case of ingored field, we are choosing keyword with max value of ignore_above to save the dupplication of docValue and fallback stored field. Updation of sub keyword field is append only, we can't remove keyword once added, these would get appended in applicable keyword fields for derived source as and when mapping gets updated.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for ignore_above parameter on keyword and wildcard fields.
    • Optimized text field retrieval in derived source scenarios.
  • Bug Fixes

    • Improved handling of values exceeding the ignore_above threshold for keyword and wildcard fields with derived source enabled.

✏️ Tip: You can customize this high-level summary in your review settings.

tanik98 avatar Nov 28 '25 03:11 tanik98

Walkthrough

This PR extends support for the ignore_above configuration to keyword and wildcard fields within derived source mappings. It introduces a composite field value fetcher mechanism to retrieve values from multiple sources (stored fields, doc values, and a dedicated ignore field) and updates core mapper classes to store values exceeding ignore_above in a special derived-source field for later retrieval.

Changes

Cohort / File(s) Summary
Changelog
CHANGELOG.md
Adds entry documenting support for ignore_above on keyword/wildcard fields and text field optimization under derived source.
Core mapper infrastructure
server/src/main/java/org/opensearch/index/mapper/CompositeFieldValueFetcher.java, FieldValueFetcher.java
Introduces CompositeFieldValueFetcher class that chains multiple FieldValueFetcher instances, trying each in sequence until a non-empty result is found. Updates FieldValueFetcher.write to handle null value lists.
Field type additions
server/src/main/java/org/opensearch/index/mapper/StringFieldType.java
Adds derivedSourceIgnoreFieldName() method and private constant IGNORED_VALUE_FIELD_SUFFIX to support derived source ignore field naming.
Keyword field mapper
server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java
Relaxes derived-source eligibility checks for ignore_above; introduces DerivedSourceHelper to manage value fetching from primary field and fallback ignore field; stores values exceeding ignore_above in dedicated ignore field when derived source is enabled.
Wildcard field mapper
server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java
Updates parse flow to store values exceeding ignore_above in derived source ignore field; relaxes guard conditions in canDeriveSourceInternal; replaces single SortedSetDocValuesFetcher with composite approach; introduces DerivedSourceHelper.
Text field mapper
server/src/main/java/org/opensearch/index/mapper/TextFieldMapper.java
Extends field type construction to propagate MultiFields; adds public getters/setters for derived-source keyword support and ignore length tracking; injects logic to store ignore field when needed; reworks derived source field value fetching to use CompositeFieldValueFetcher; introduces DerivedSourceHelper to detect and collect keyword-derived fetchers.
Match-only text field mapper
server/src/main/java/org/opensearch/index/mapper/MatchOnlyTextFieldMapper.java
Updates buildFieldType method signature to accept MultiFields parameter; threads multi-field configuration through field type construction.
Integration tests
server/src/internalClusterTest/java/org/opensearch/get/GetActionIT.java, recovery/FullRollingRestartIT.java, search/simple/SimpleSearchIT.java, update/UpdateIT.java
Updates derived source test mappings: extends keyword sub-field definitions with ignore_above configuration; removes store: true from text fields and doc_values: true from wildcard fields; updates document indexing and assertions to reflect new field configurations.
Unit tests
server/src/test/java/org/opensearch/index/mapper/CompositeFieldValueFetcherTests.java
New test class validating CompositeFieldValueFetcher behavior across multiple fetcher scenarios, empty lists, and exception propagation.
Keyword field mapper tests
server/src/test/java/org/opensearch/index/mapper/KeywordFieldMapperTests.java
Adds factory method getMapperServiceForDerivedSource and tests for derived source behavior with ignore_above threshold scenarios; replaces exception expectations with no-exception assertions for canDeriveSource path.
Text field mapper tests
server/src/test/java/org/opensearch/index/mapper/TextFieldMapperTests.java
Renames and refactors stored field test; adds multiple tests for derived source with keyword sub-fields, ignore_above thresholds, normalization, and long text handling; updates IndexAnalyzers construction with "whitespace" analyzer.
Wildcard field mapper tests
server/src/test/java/org/opensearch/index/mapper/WildcardFieldMapperTests.java
Adds tests for derived value fetching and parse behavior around ignore_above threshold with derived source enabled; introduces getMapperServiceForDerivedSource helper and assertDoesNotThrow utility.
Object mapper tests
server/src/test/java/org/opensearch/index/mapper/ObjectMapperTests.java
Removes multiple derive-source validation test cases (ignore_above, nested object fields, copy_to, multi-field scenarios); simplifies remaining multi-type field mapping by removing stored and doc_values flags.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Parser as Field Parser
    participant IgnoreField as Ignore Field Storage
    participant Derived as Derived Source Generator
    participant Composite as CompositeFieldValueFetcher
    participant Primary as Primary Fetcher
    participant Fallback as Fallback Fetcher

    Client->>Parser: Index document with text exceeding ignore_above
    Parser->>Parser: Check if derived source enabled & value > ignore_above
    alt Value exceeds ignore_above threshold
        Parser->>IgnoreField: Store value in derivedSourceIgnore field
        Parser->>Derived: Skip normal indexing, return early
    else Value within threshold
        Parser->>Parser: Normal field indexing
    end
    
    Note over Client,Fallback: Later: Deriving source for document
    Client->>Derived: Request derived source
    Derived->>Composite: Create fetcher chain
    Composite->>Primary: Try primary fetcher (doc values or stored)
    alt Primary has value
        Primary-->>Composite: Return value
        Composite-->>Derived: Return converted value
    else Primary empty
        Composite->>Fallback: Try fallback fetcher (ignore field)
        Fallback->>IgnoreField: Read from derivedSourceIgnore field
        IgnoreField-->>Fallback: Return stored value
        Fallback-->>Composite: Return converted value
        Composite-->>Derived: Return value
    end
    Derived-->>Client: Return derived source with retrieved value

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • CompositeFieldValueFetcher logic and proper ordering of fetcher chains across multiple field mappers
  • Derived source storage and retrieval paths in KeywordFieldMapper, WildcardFieldMapper, and TextFieldMapper β€” verify null-safety and edge cases around ignore_above thresholds
  • Public API additions to StringFieldType and TextFieldType β€” confirm backward compatibility and semantic correctness of new getters/setters
  • DerivedSourceHelper implementations across three mapper classes β€” ensure consistency in keyword detection and fetcher collection logic
  • Multi-field wiring changes in TextFieldMapper and MatchOnlyTextFieldMapper β€” verify MultiFields parameter threading does not break existing functionality
  • Integration test updates β€” confirm that removed store and doc_values flags do not inadvertently change test expectations or mask regressions

Suggested labels

Search:Performance

Suggested reviewers

  • cwperks
  • gbbafna
  • saratvemulapalli
  • sachinpkale
  • kotwanikunal

Poem

🐰 Ignoring above, we now derive with care, Composite fetchers fetch from everywhere, Keyword and wildcard fields now play their part, Derived source shines, a clever work of art! Long text? No problemβ€”we store what's spared! ✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description provides clear technical details about the changes and approach, but is missing required template sections like 'Related Issues' and the 'Check List' to confirm testing and documentation. Add the 'Related Issues' section (should reference #20113), complete the 'Check List' by checking the appropriate boxes, and ensure all required template sections are filled out.
Docstring Coverage ⚠️ Warning Docstring coverage is 6.12% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
βœ… Passed checks (1 passed)
Check name Status Explanation
Title check βœ… Passed The title directly and clearly describes the main changes: adding support for ignore_above in keyword/wildcard fields and optimizing text field behavior under derived source, which aligns with the changeset.
✨ Finishing touches
  • [ ] πŸ“ Generate docstrings
πŸ§ͺ Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 28 '25 03:11 coderabbitai[bot]

:x: Gradle check result for e4707a18c0491b2e6ccd38685c1b049bd1ef4e35: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Nov 28 '25 03:11 github-actions[bot]

:x: Gradle check result for 6ad14479983738bf95a01dd95512275505434300: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Nov 28 '25 03:11 github-actions[bot]

:x: Gradle check result for c15279382fcedfc737d22158150fc27930a1546c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 08 '25 17:12 github-actions[bot]

:x: Gradle check result for 9a2ff8478da4a5ad4c5f0524ef8f09522ba61a89: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 08 '25 19:12 github-actions[bot]

:x: Gradle check result for de290a0187e3dea8b8dfd9d42fed3e654c52031c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 08 '25 19:12 github-actions[bot]

:x: Gradle check result for 007aa206e6ea3cbee7e1350e9a63e7612cfa6a4a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 09 '25 03:12 github-actions[bot]

:white_check_mark: Gradle check result for 88ac3180c9d32b6fe8ca7e2e09f1c46f5fb527be: SUCCESS

github-actions[bot] avatar Dec 09 '25 05:12 github-actions[bot]

Codecov Report

:x: Patch coverage is 82.06897% with 26 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 73.26%. Comparing base (d47931e) to head (88ac318). :warning: Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...g/opensearch/index/mapper/WildcardFieldMapper.java 71.42% 6 Missing and 4 partials :warning:
...rg/opensearch/index/mapper/KeywordFieldMapper.java 76.47% 5 Missing and 3 partials :warning:
...a/org/opensearch/index/mapper/TextFieldMapper.java 86.20% 3 Missing and 5 partials :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20113      +/-   ##
============================================
- Coverage     73.30%   73.26%   -0.04%     
- Complexity    71732    71761      +29     
============================================
  Files          5793     5794       +1     
  Lines        328056   328228     +172     
  Branches      47245    47285      +40     
============================================
+ Hits         240476   240490      +14     
- Misses        68264    68442     +178     
+ Partials      19316    19296      -20     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Dec 09 '25 05:12 codecov[bot]