OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Add support for forward translog reading

Open xuxiong1 opened this issue 1 month ago โ€ข 17 comments

Description

This commit introduces a feature to read translog operations in forward order (oldest to newest) instead of the default backward order (newest to oldest).

Changes:

  • Add index.translog.read_forward setting (default: false) in IndexSettings
  • Update MultiSnapshot to support bidirectional reading based on setting

Tests:

  • testRecoveryTrimsLocalTranslogWithReadForward (RecoveryTests)
  • testSeqNoCollisionWithReadForward (IndexLevelReplicationTests)
  • testSnapshotReadOperationForward (LocalTranslogTests)

Related Issues

Resolves #20094

Check List

  • [x] Functionality includes testing.
  • [ ] API changes companion pull request created, if applicable.
  • [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • New Features

    • Added a configurable index-level option to enable forward reading of transaction logs for snapshots and recoveries.
  • Improvements

    • Resync now advances translog generation before trimming and adjusts initial sync boundaries to improve recovery correctness.
  • Tests

    • Added tests validating forward translog reading across snapshots, replication/recovery flows, and recovery trimming.
  • Documentation

    • Updated changelog to announce forward translog read support.

โœ๏ธ Tip: You can customize this high-level summary in your review settings.

xuxiong1 avatar Dec 04 '25 01:12 xuxiong1

Walkthrough

Adds an index-level boolean to control translog read direction, threads it into Translog and MultiSnapshot to allow forward or backward snapshot iteration, changes translog generation handling before resync trimming, adjusts initial sync boundary computation, and adds tests exercising forward-read behavior.

Changes

Cohort / File(s) Summary
Changelog & Index settings
CHANGELOG.md, server/src/main/java/org/opensearch/index/IndexSettings.java
Added changelog line for forward translog reading and introduced INDEX_TRANSLOG_READ_FORWARD_SETTING (index-scoped boolean, default false), backing field translogReadForward, and accessor isTranslogReadForward().
Translog snapshot traversal
server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java
Constructor now takes boolean readForward; traversal uses a PrimitiveIterator.OfInt to iterate either forward (0..N-1) or backward (N-1..0) while preserving per-operation deduplication.
Translog integration
server/src/main/java/org/opensearch/index/translog/Translog.java
newMultiSnapshot(...) now forwards indexSettings().isTranslogReadForward() into the MultiSnapshot constructor (constructor signature changed).
Resync trimming ordering
server/src/main/java/org/opensearch/action/resync/TransportResyncReplicationAction.java
When a resync request includes TrimAboveSeqNo, the replica now calls replica.rollTranslogGeneration() before trimming previously synchronized primary terms.
Primary/replica sync boundary
server/src/main/java/org/opensearch/index/shard/PrimaryReplicaSyncer.java
Initial resync boundary for the first batch changed to use startingSeqNo - 1 (instead of previous max-based value) when constructing ResyncReplicationRequest.
Replication & recovery tests
server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java, server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java
Added testSeqNoCollisionWithReadForward() and testRecoveryTrimsLocalTranslogWithReadForward() to validate forward-read paths; one test addition is duplicated in the file.
Translog unit tests
server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java
Added testSnapshotReadOperationForward() to assert forward-enabled translog snapshots return operations in forward generation order.

Sequence Diagram(s)

sequenceDiagram
    participant Config as Index Config
    participant Settings as IndexSettings
    participant Translog as Translog
    participant Multi as MultiSnapshot
    participant Files as Translog Files

    Config->>Settings: load index settings
    Settings->>Settings: read INDEX_TRANSLOG_READ_FORWARD_SETTING
    Translog->>Settings: isTranslogReadForward()
    Translog->>Multi: newMultiSnapshot(snapshots, onClose, readForward)
    alt readForward == true
        note right of Multi: iterator yields 0..N-1 (forward)
    else
        note right of Multi: iterator yields N-1..0 (backward)
    end
    loop replay snapshots
        Multi->>Files: read snapshot at iterator index
        Files-->>Multi: operations (per-file order preserved)
        Multi->>Multi: deduplicate by seqNo and yield operations
    end

Estimated code review effort

๐ŸŽฏ 3 (Moderate) | โฑ๏ธ ~30 minutes

  • Areas needing extra attention:
    • MultiSnapshot: correctness of iterator initialization and preserved deduplication semantics for both directions.
    • Translog.newMultiSnapshot: propagation of the flag and onClose/error behavior.
    • TransportResyncReplicationAction: ordering of rollTranslogGeneration() relative to trimming.
    • PrimaryReplicaSyncer: correctness of the new initial boundary computation.
    • Tests: duplicated test in IndexLevelReplicationTests and deterministic behavior of forward-read tests.

Suggested labels

v3.4.0, backport 3.4

Suggested reviewers

  • sachinpkale
  • reta
  • mch2
  • msfroh
  • andrross
  • dbwiddis
  • kotwanikunal
  • shwetathareja
  • owaiskazi19
  • saratvemulapalli

Poem

๐Ÿฐ I hopped through tlogs, from old to new,
Forward I bounded, then backward I knew.
Generations stitched in tidy array,
Seq numbers steady, no stale in my way.
The rabbit nods โ€” replay's set to play.

Pre-merge checks and finishing touches

โŒ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage โš ๏ธ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
โœ… Passed checks (4 passed)
Check name Status Explanation
Title check โœ… Passed The title clearly and concisely describes the main feature added: support for forward translog reading, which matches the primary changeset.
Description check โœ… Passed The description covers the main changes, lists three new tests, references the resolved issue #20094, and includes completed checklist items as required by the template.
Linked Issues check โœ… Passed The PR addresses the core requirements from issue #20094: adds a configurable setting for forward translog reading [#20094], updates MultiSnapshot for bidirectional reading [#20094], and includes tests demonstrating the functionality [#20094].
Out of Scope Changes check โœ… Passed All code changes are directly related to enabling forward translog reading: new index setting, MultiSnapshot updates, translog changes, resync trimming adjustments, and comprehensive test coverage. No unrelated or out-of-scope changes detected.
โœจ Finishing touches
๐Ÿงช Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

๐Ÿ“œ Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

๐Ÿ“ฅ Commits

Reviewing files that changed from the base of the PR and between 79c1e0a371bb0f827630dc49f4696d774208b0dc and 78bc9213cce3d1bae996d8fd3344fde54c67df0c.

๐Ÿ“’ Files selected for processing (1)
  • CHANGELOG.md (1 hunks)
โฐ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: Analyze (java)
๐Ÿ”‡ Additional comments (1)
CHANGELOG.md (1)

41-41: Changelog entry looks good.

The missing closing parenthesis flagged in the previous review has been addressed, and the markdown link formatting is now correct.


Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Dec 04 '25 01:12 coderabbitai[bot]

:x: Gradle check result for c968213e127175aa6d29a5d5bf3e74f34e37db38: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 04 '25 03:12 github-actions[bot]

:x: Gradle check result for fe0811d93ef520d1d50088b60330fc89451fdb49: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 05 '25 20:12 github-actions[bot]

:x: Gradle check result for ec0ab6531e3320bacb50e06ddde962dd29976c28: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 08 '25 19:12 github-actions[bot]

:white_check_mark: Gradle check result for 10b201b806ec883f65090e63f1f4661ccfecc7e3: SUCCESS

github-actions[bot] avatar Dec 08 '25 22:12 github-actions[bot]

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 73.24%. Comparing base (d520f8d) to head (78bc921). :warning: Report is 2 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20163      +/-   ##
============================================
- Coverage     73.25%   73.24%   -0.01%     
+ Complexity    71792    71770      -22     
============================================
  Files          5795     5795              
  Lines        328287   328297      +10     
  Branches      47279    47282       +3     
============================================
  Hits         240474   240474              
+ Misses        68559    68553       -6     
- Partials      19254    19270      +16     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Dec 08 '25 22:12 codecov[bot]

:x: Gradle check result for 996de9fcf444fd51d34d819b1a9f2c3a1136bdf8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 09 '25 01:12 github-actions[bot]

:x: Gradle check result for 6386cc88a431fc2360117a618037a5850937b512: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 09 '25 02:12 github-actions[bot]

:x: Gradle check result for aaf78c5d211146dbb333072003d5ae22bed07b0d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 09 '25 07:12 github-actions[bot]

:x: Gradle check result for aaf78c5d211146dbb333072003d5ae22bed07b0d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 09 '25 17:12 github-actions[bot]

:x: Gradle check result for aaf78c5d211146dbb333072003d5ae22bed07b0d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 09 '25 19:12 github-actions[bot]

:x: Gradle check result for f5c73034977d2ee115924eb47a775d4e3fbae09a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 09 '25 20:12 github-actions[bot]

@xuxiong1 -- Looks like the latest test failure is related to this change

org.opensearch.indices.recovery.RecoveryTests.testRecoveryTrimsLocalTranslogWithReadForward failed

java.lang.AssertionError: NoOp{seqNo=26, primaryTerm=41, reason='filling gaps'}
Expected: <40L>
     but: was <41L>
REPRODUCE WITH: 
./gradlew ':server:test' --tests 'org.opensearch.indices.recovery.RecoveryTests.testRecoveryTrimsLocalTranslogWithReadForward' -Dtests.seed=AA22BA30D3B12559 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=yue-Hans -Dtests.timezone=America/Knox_IN -Druntime.java=25

msfroh avatar Dec 09 '25 20:12 msfroh

:x: Gradle check result for 3c85d106eed38f36bd734f346abea199c61f5caa: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 10 '25 02:12 github-actions[bot]

@xuxiong1 -- Looks like the latest test failure is related to this change

org.opensearch.indices.recovery.RecoveryTests.testRecoveryTrimsLocalTranslogWithReadForward failed

java.lang.AssertionError: NoOp{seqNo=26, primaryTerm=41, reason='filling gaps'}
Expected: <40L>
     but: was <41L>
REPRODUCE WITH: 
./gradlew ':server:test' --tests 'org.opensearch.indices.recovery.RecoveryTests.testRecoveryTrimsLocalTranslogWithReadForward' -Dtests.seed=AA22BA30D3B12559 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=yue-Hans -Dtests.timezone=America/Knox_IN -Druntime.java=25

Was able to reproduce with the seed.

I think the issue is from the primary replica re-sync process, the new primary would re-sync its operations (from global checkpoint to max seqNo) to the old primary (now replica), which would cause the duplicate operations (same seqNo, different term) in the old primary's translog.

The re-sync process also triggers trimOperationOfPreviousPrimaryTerms on the replica, but was passed with the max seqNo as the trimAboveSeqNo (honestly, I didn't get why we would pass the max seq instead of the start seq), so in the replica the operations between the global checkpoint to the max seqNo are not actually trimmed. Then the read forward would encounter the stale operation (read backward actually hides the issue).

The fix is to trim on replica during re-sync, starting from the global checkpoint instead of max seqNo, like how it's done in the finalizeRecovery

xuxiong1 avatar Dec 10 '25 05:12 xuxiong1

:x: Gradle check result for 5c8b37d6b01cfba29f28b8a9e65d97b8713ec263: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 10 '25 05:12 github-actions[bot]

:x: Gradle check result for 7ca112a47cfeff51cfd9af3b26d24811ab34e755: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 10 '25 07:12 github-actions[bot]

:grey_exclamation: Gradle check result for 78bc9213cce3d1bae996d8fd3344fde54c67df0c: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Dec 10 '25 19:12 github-actions[bot]