OpenSearch
OpenSearch copied to clipboard
[Segment Replication] Swap replica to writeable engine during failover.
As part of #2212, after a new primary has been selected we will need to convert its engine from an NRTReplicationEngine to InternalEngine.
The logic we are looking for here is similar to IndexShard's resetEngineToGlobalCheckpoint. However, we cannot simply close and reopen the engine. Replicas may have uncommitted operations in the index if it had synced from a refresh point of the previous primary. Lucene does not currently provide a way for us to convert from a directory reader that refreshes on an externally provided segmentInfos back to one that refreshes on an IW. This means we will only be able to open the InternalEngine with its writer from disk.
A suggested sequence for replica promotion:
- Invoke a SegmentInfos.commit on the replica, creating a new commit point (Segments_N). We are safe committing here outside of an IW because there are no buffers that need to be flushed / new segments that need to be created. This commit should also include the latest local cp on the replica in userdata.
- manually fsync the store with
directory.sync
so the commit point is durably persisted. - purge the xlog up to the local cp.
- close and open up a new InternalEngine.
- Replay any ops remaining in the xlog.
- Refresh to push out latest checkpoint to replicas.
Checklist for me as I go through this...
- [x] commit SegmentInfos on the replica storing checkpoints.
- [x] Write a test asserting the engine type is swapped during failover.
- [x] xlog is purged up to the local cp on the commit and that remaining ops are indexed.
- [x] Accuracy of the local cp on the replica, this is being sent as seqNo during replication, it should be set by reading the index.
- [x] Prevent reindexing after promotion - engine should not revert to a previous safe commit & reindex up to global cp.