Add support for forward translog reading
Description
This commit introduces a feature to read translog operations in forward order (oldest to newest) instead of the default backward order (newest to oldest).
Changes:
- Add
index.translog.read_forwardsetting (default: false) inIndexSettings - Update MultiSnapshot to support bidirectional reading based on setting
Tests:
- testRecoveryTrimsLocalTranslogWithReadForward (RecoveryTests)
- testSeqNoCollisionWithReadForward (IndexLevelReplicationTests)
- testSnapshotReadOperationForward (LocalTranslogTests)
Related Issues
Resolves #20094
Check List
- [x] Functionality includes testing.
- [ ] API changes companion pull request created, if applicable.
- [ ] Public documentation issue/PR created, if applicable.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
-
New Features
- Added a configurable index-level option to enable forward reading of transaction logs for snapshots and recoveries.
-
Improvements
- Resync now advances translog generation before trimming and adjusts initial sync boundaries to improve recovery correctness.
-
Tests
- Added tests validating forward translog reading across snapshots, replication/recovery flows, and recovery trimming.
-
Documentation
- Updated changelog to announce forward translog read support.
โ๏ธ Tip: You can customize this high-level summary in your review settings.
Walkthrough
Adds an index-level boolean to control translog read direction, threads it into Translog and MultiSnapshot to allow forward or backward snapshot iteration, changes translog generation handling before resync trimming, adjusts initial sync boundary computation, and adds tests exercising forward-read behavior.
Changes
| Cohort / File(s) | Summary |
|---|---|
Changelog & Index settings CHANGELOG.md, server/src/main/java/org/opensearch/index/IndexSettings.java |
Added changelog line for forward translog reading and introduced INDEX_TRANSLOG_READ_FORWARD_SETTING (index-scoped boolean, default false), backing field translogReadForward, and accessor isTranslogReadForward(). |
Translog snapshot traversal server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java |
Constructor now takes boolean readForward; traversal uses a PrimitiveIterator.OfInt to iterate either forward (0..N-1) or backward (N-1..0) while preserving per-operation deduplication. |
Translog integration server/src/main/java/org/opensearch/index/translog/Translog.java |
newMultiSnapshot(...) now forwards indexSettings().isTranslogReadForward() into the MultiSnapshot constructor (constructor signature changed). |
Resync trimming ordering server/src/main/java/org/opensearch/action/resync/TransportResyncReplicationAction.java |
When a resync request includes TrimAboveSeqNo, the replica now calls replica.rollTranslogGeneration() before trimming previously synchronized primary terms. |
Primary/replica sync boundary server/src/main/java/org/opensearch/index/shard/PrimaryReplicaSyncer.java |
Initial resync boundary for the first batch changed to use startingSeqNo - 1 (instead of previous max-based value) when constructing ResyncReplicationRequest. |
Replication & recovery tests server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java, server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java |
Added testSeqNoCollisionWithReadForward() and testRecoveryTrimsLocalTranslogWithReadForward() to validate forward-read paths; one test addition is duplicated in the file. |
Translog unit tests server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java |
Added testSnapshotReadOperationForward() to assert forward-enabled translog snapshots return operations in forward generation order. |
Sequence Diagram(s)
sequenceDiagram
participant Config as Index Config
participant Settings as IndexSettings
participant Translog as Translog
participant Multi as MultiSnapshot
participant Files as Translog Files
Config->>Settings: load index settings
Settings->>Settings: read INDEX_TRANSLOG_READ_FORWARD_SETTING
Translog->>Settings: isTranslogReadForward()
Translog->>Multi: newMultiSnapshot(snapshots, onClose, readForward)
alt readForward == true
note right of Multi: iterator yields 0..N-1 (forward)
else
note right of Multi: iterator yields N-1..0 (backward)
end
loop replay snapshots
Multi->>Files: read snapshot at iterator index
Files-->>Multi: operations (per-file order preserved)
Multi->>Multi: deduplicate by seqNo and yield operations
end
Estimated code review effort
๐ฏ 3 (Moderate) | โฑ๏ธ ~30 minutes
- Areas needing extra attention:
-
MultiSnapshot: correctness of iterator initialization and preserved deduplication semantics for both directions. -
Translog.newMultiSnapshot: propagation of the flag and onClose/error behavior. -
TransportResyncReplicationAction: ordering ofrollTranslogGeneration()relative to trimming. -
PrimaryReplicaSyncer: correctness of the new initial boundary computation. - Tests: duplicated test in
IndexLevelReplicationTestsand deterministic behavior of forward-read tests.
-
Suggested labels
v3.4.0, backport 3.4
Suggested reviewers
- sachinpkale
- reta
- mch2
- msfroh
- andrross
- dbwiddis
- kotwanikunal
- shwetathareja
- owaiskazi19
- saratvemulapalli
Poem
๐ฐ I hopped through tlogs, from old to new,
Forward I bounded, then backward I knew.
Generations stitched in tidy array,
Seq numbers steady, no stale in my way.
The rabbit nods โ replay's set to play.
Pre-merge checks and finishing touches
โ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | โ ๏ธ Warning | Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
โ Passed checks (4 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | โ Passed | The title clearly and concisely describes the main feature added: support for forward translog reading, which matches the primary changeset. |
| Description check | โ Passed | The description covers the main changes, lists three new tests, references the resolved issue #20094, and includes completed checklist items as required by the template. |
| Linked Issues check | โ Passed | The PR addresses the core requirements from issue #20094: adds a configurable setting for forward translog reading [#20094], updates MultiSnapshot for bidirectional reading [#20094], and includes tests demonstrating the functionality [#20094]. |
| Out of Scope Changes check | โ Passed | All code changes are directly related to enabling forward translog reading: new index setting, MultiSnapshot updates, translog changes, resync trimming adjustments, and comprehensive test coverage. No unrelated or out-of-scope changes detected. |
โจ Finishing touches
๐งช Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
๐ Recent review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
๐ฅ Commits
Reviewing files that changed from the base of the PR and between 79c1e0a371bb0f827630dc49f4696d774208b0dc and 78bc9213cce3d1bae996d8fd3344fde54c67df0c.
๐ Files selected for processing (1)
-
CHANGELOG.md(1 hunks)
โฐ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: gradle-check
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: Analyze (java)
๐ Additional comments (1)
CHANGELOG.md (1)
41-41: Changelog entry looks good.The missing closing parenthesis flagged in the previous review has been addressed, and the markdown link formatting is now correct.
Comment @coderabbitai help to get the list of available commands and usage tips.
:x: Gradle check result for c968213e127175aa6d29a5d5bf3e74f34e37db38: null
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for fe0811d93ef520d1d50088b60330fc89451fdb49: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for ec0ab6531e3320bacb50e06ddde962dd29976c28: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for 10b201b806ec883f65090e63f1f4661ccfecc7e3: SUCCESS
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 73.24%. Comparing base (d520f8d) to head (78bc921).
:warning: Report is 2 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #20163 +/- ##
============================================
- Coverage 73.25% 73.24% -0.01%
+ Complexity 71792 71770 -22
============================================
Files 5795 5795
Lines 328287 328297 +10
Branches 47279 47282 +3
============================================
Hits 240474 240474
+ Misses 68559 68553 -6
- Partials 19254 19270 +16
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
:x: Gradle check result for 996de9fcf444fd51d34d819b1a9f2c3a1136bdf8: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 6386cc88a431fc2360117a618037a5850937b512: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for aaf78c5d211146dbb333072003d5ae22bed07b0d: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for aaf78c5d211146dbb333072003d5ae22bed07b0d: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for aaf78c5d211146dbb333072003d5ae22bed07b0d: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for f5c73034977d2ee115924eb47a775d4e3fbae09a: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
@xuxiong1 -- Looks like the latest test failure is related to this change
org.opensearch.indices.recovery.RecoveryTests.testRecoveryTrimsLocalTranslogWithReadForward failed
java.lang.AssertionError: NoOp{seqNo=26, primaryTerm=41, reason='filling gaps'}
Expected: <40L>
but: was <41L>
REPRODUCE WITH:
./gradlew ':server:test' --tests 'org.opensearch.indices.recovery.RecoveryTests.testRecoveryTrimsLocalTranslogWithReadForward' -Dtests.seed=AA22BA30D3B12559 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=yue-Hans -Dtests.timezone=America/Knox_IN -Druntime.java=25
:x: Gradle check result for 3c85d106eed38f36bd734f346abea199c61f5caa: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
@xuxiong1 -- Looks like the latest test failure is related to this change
org.opensearch.indices.recovery.RecoveryTests.testRecoveryTrimsLocalTranslogWithReadForward failed java.lang.AssertionError: NoOp{seqNo=26, primaryTerm=41, reason='filling gaps'} Expected: <40L> but: was <41L>REPRODUCE WITH:./gradlew ':server:test' --tests 'org.opensearch.indices.recovery.RecoveryTests.testRecoveryTrimsLocalTranslogWithReadForward' -Dtests.seed=AA22BA30D3B12559 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=yue-Hans -Dtests.timezone=America/Knox_IN -Druntime.java=25
Was able to reproduce with the seed.
I think the issue is from the primary replica re-sync process, the new primary would re-sync its operations (from global checkpoint to max seqNo) to the old primary (now replica), which would cause the duplicate operations (same seqNo, different term) in the old primary's translog.
The re-sync process also triggers trimOperationOfPreviousPrimaryTerms on the replica, but was passed with the max seqNo as the trimAboveSeqNo (honestly, I didn't get why we would pass the max seq instead of the start seq), so in the replica the operations between the global checkpoint to the max seqNo are not actually trimmed. Then the read forward would encounter the stale operation (read backward actually hides the issue).
The fix is to trim on replica during re-sync, starting from the global checkpoint instead of max seqNo, like how it's done in the finalizeRecovery
:x: Gradle check result for 5c8b37d6b01cfba29f28b8a9e65d97b8713ec263: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 7ca112a47cfeff51cfd9af3b26d24811ab34e755: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:grey_exclamation: Gradle check result for 78bc9213cce3d1bae996d8fd3344fde54c67df0c: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.