Support ignore_above for keyword/wildcard field and optimise text field under derived source
- For keyword and wildcard field handle
ignore_aboveby explicitly storing values exceeding that and using stored field for using it as a fallback - For text field, use sub keyword fields for deriving source(using
doc_values) instead of explicitly storing the value, this way we can avoid storing duplicates(as adoc_valuesunder sub keyword andstored fieldunder text field). - In case of multiple keyword fields, we will be going through each one to find first one, and in case of ingored field, we are choosing keyword with max value of ignore_above to save the dupplication of docValue and fallback stored field. Updation of sub keyword field is append only, we can't remove keyword once added, these would get appended in applicable keyword fields for derived source as and when mapping gets updated.
Summary by CodeRabbit
Release Notes
-
New Features
- Added support for
ignore_aboveparameter on keyword and wildcard fields. - Optimized text field retrieval in derived source scenarios.
- Added support for
-
Bug Fixes
- Improved handling of values exceeding the
ignore_abovethreshold for keyword and wildcard fields with derived source enabled.
- Improved handling of values exceeding the
βοΈ Tip: You can customize this high-level summary in your review settings.
Walkthrough
This PR extends support for the ignore_above configuration to keyword and wildcard fields within derived source mappings. It introduces a composite field value fetcher mechanism to retrieve values from multiple sources (stored fields, doc values, and a dedicated ignore field) and updates core mapper classes to store values exceeding ignore_above in a special derived-source field for later retrieval.
Changes
| Cohort / File(s) | Summary |
|---|---|
Changelog CHANGELOG.md |
Adds entry documenting support for ignore_above on keyword/wildcard fields and text field optimization under derived source. |
Core mapper infrastructure server/src/main/java/org/opensearch/index/mapper/CompositeFieldValueFetcher.java, FieldValueFetcher.java |
Introduces CompositeFieldValueFetcher class that chains multiple FieldValueFetcher instances, trying each in sequence until a non-empty result is found. Updates FieldValueFetcher.write to handle null value lists. |
Field type additions server/src/main/java/org/opensearch/index/mapper/StringFieldType.java |
Adds derivedSourceIgnoreFieldName() method and private constant IGNORED_VALUE_FIELD_SUFFIX to support derived source ignore field naming. |
Keyword field mapper server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java |
Relaxes derived-source eligibility checks for ignore_above; introduces DerivedSourceHelper to manage value fetching from primary field and fallback ignore field; stores values exceeding ignore_above in dedicated ignore field when derived source is enabled. |
Wildcard field mapper server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java |
Updates parse flow to store values exceeding ignore_above in derived source ignore field; relaxes guard conditions in canDeriveSourceInternal; replaces single SortedSetDocValuesFetcher with composite approach; introduces DerivedSourceHelper. |
Text field mapper server/src/main/java/org/opensearch/index/mapper/TextFieldMapper.java |
Extends field type construction to propagate MultiFields; adds public getters/setters for derived-source keyword support and ignore length tracking; injects logic to store ignore field when needed; reworks derived source field value fetching to use CompositeFieldValueFetcher; introduces DerivedSourceHelper to detect and collect keyword-derived fetchers. |
Match-only text field mapper server/src/main/java/org/opensearch/index/mapper/MatchOnlyTextFieldMapper.java |
Updates buildFieldType method signature to accept MultiFields parameter; threads multi-field configuration through field type construction. |
Integration tests server/src/internalClusterTest/java/org/opensearch/get/GetActionIT.java, recovery/FullRollingRestartIT.java, search/simple/SimpleSearchIT.java, update/UpdateIT.java |
Updates derived source test mappings: extends keyword sub-field definitions with ignore_above configuration; removes store: true from text fields and doc_values: true from wildcard fields; updates document indexing and assertions to reflect new field configurations. |
Unit tests server/src/test/java/org/opensearch/index/mapper/CompositeFieldValueFetcherTests.java |
New test class validating CompositeFieldValueFetcher behavior across multiple fetcher scenarios, empty lists, and exception propagation. |
Keyword field mapper tests server/src/test/java/org/opensearch/index/mapper/KeywordFieldMapperTests.java |
Adds factory method getMapperServiceForDerivedSource and tests for derived source behavior with ignore_above threshold scenarios; replaces exception expectations with no-exception assertions for canDeriveSource path. |
Text field mapper tests server/src/test/java/org/opensearch/index/mapper/TextFieldMapperTests.java |
Renames and refactors stored field test; adds multiple tests for derived source with keyword sub-fields, ignore_above thresholds, normalization, and long text handling; updates IndexAnalyzers construction with "whitespace" analyzer. |
Wildcard field mapper tests server/src/test/java/org/opensearch/index/mapper/WildcardFieldMapperTests.java |
Adds tests for derived value fetching and parse behavior around ignore_above threshold with derived source enabled; introduces getMapperServiceForDerivedSource helper and assertDoesNotThrow utility. |
Object mapper tests server/src/test/java/org/opensearch/index/mapper/ObjectMapperTests.java |
Removes multiple derive-source validation test cases (ignore_above, nested object fields, copy_to, multi-field scenarios); simplifies remaining multi-type field mapping by removing stored and doc_values flags. |
Sequence Diagram
sequenceDiagram
participant Client
participant Parser as Field Parser
participant IgnoreField as Ignore Field Storage
participant Derived as Derived Source Generator
participant Composite as CompositeFieldValueFetcher
participant Primary as Primary Fetcher
participant Fallback as Fallback Fetcher
Client->>Parser: Index document with text exceeding ignore_above
Parser->>Parser: Check if derived source enabled & value > ignore_above
alt Value exceeds ignore_above threshold
Parser->>IgnoreField: Store value in derivedSourceIgnore field
Parser->>Derived: Skip normal indexing, return early
else Value within threshold
Parser->>Parser: Normal field indexing
end
Note over Client,Fallback: Later: Deriving source for document
Client->>Derived: Request derived source
Derived->>Composite: Create fetcher chain
Composite->>Primary: Try primary fetcher (doc values or stored)
alt Primary has value
Primary-->>Composite: Return value
Composite-->>Derived: Return converted value
else Primary empty
Composite->>Fallback: Try fallback fetcher (ignore field)
Fallback->>IgnoreField: Read from derivedSourceIgnore field
IgnoreField-->>Fallback: Return stored value
Fallback-->>Composite: Return converted value
Composite-->>Derived: Return value
end
Derived-->>Client: Return derived source with retrieved value
Estimated code review effort
π― 4 (Complex) | β±οΈ ~60 minutes
Areas requiring extra attention:
CompositeFieldValueFetcherlogic and proper ordering of fetcher chains across multiple field mappers- Derived source storage and retrieval paths in
KeywordFieldMapper,WildcardFieldMapper, andTextFieldMapperβ verify null-safety and edge cases aroundignore_abovethresholds - Public API additions to
StringFieldTypeandTextFieldTypeβ confirm backward compatibility and semantic correctness of new getters/setters DerivedSourceHelperimplementations across three mapper classes β ensure consistency in keyword detection and fetcher collection logic- Multi-field wiring changes in
TextFieldMapperandMatchOnlyTextFieldMapperβ verifyMultiFieldsparameter threading does not break existing functionality - Integration test updates β confirm that removed
storeanddoc_valuesflags do not inadvertently change test expectations or mask regressions
Suggested labels
Search:Performance
Suggested reviewers
- cwperks
- gbbafna
- saratvemulapalli
- sachinpkale
- kotwanikunal
Poem
π° Ignoring above, we now derive with care, Composite fetchers fetch from everywhere, Keyword and wildcard fields now play their part, Derived source shines, a clever work of art! Long text? No problemβwe store what's spared! β¨
Pre-merge checks and finishing touches
β Failed checks (2 warnings)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Description check | β οΈ Warning | The description provides clear technical details about the changes and approach, but is missing required template sections like 'Related Issues' and the 'Check List' to confirm testing and documentation. | Add the 'Related Issues' section (should reference #20113), complete the 'Check List' by checking the appropriate boxes, and ensure all required template sections are filled out. |
| Docstring Coverage | β οΈ Warning | Docstring coverage is 6.12% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
β Passed checks (1 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | β Passed | The title directly and clearly describes the main changes: adding support for ignore_above in keyword/wildcard fields and optimizing text field behavior under derived source, which aligns with the changeset. |
β¨ Finishing touches
- [ ] π Generate docstrings
π§ͺ Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Comment @coderabbitai help to get the list of available commands and usage tips.
:x: Gradle check result for e4707a18c0491b2e6ccd38685c1b049bd1ef4e35: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 6ad14479983738bf95a01dd95512275505434300: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for c15279382fcedfc737d22158150fc27930a1546c: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 9a2ff8478da4a5ad4c5f0524ef8f09522ba61a89: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for de290a0187e3dea8b8dfd9d42fed3e654c52031c: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 007aa206e6ea3cbee7e1350e9a63e7612cfa6a4a: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for 88ac3180c9d32b6fe8ca7e2e09f1c46f5fb527be: SUCCESS
Codecov Report
:x: Patch coverage is 82.06897% with 26 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 73.26%. Comparing base (d47931e) to head (88ac318).
:warning: Report is 8 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #20113 +/- ##
============================================
- Coverage 73.30% 73.26% -0.04%
- Complexity 71732 71761 +29
============================================
Files 5793 5794 +1
Lines 328056 328228 +172
Branches 47245 47285 +40
============================================
+ Hits 240476 240490 +14
- Misses 68264 68442 +178
+ Partials 19316 19296 -20
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.