[BUG] [Remote Store] Missing remote metadata in some cases of concurrent flush and refresh executions
Describe the bug
Use cases like taking snapshots from a local commit requires corresponding remote metadata file to be present which refers to the segments files of the local commit. As a best effort, snapshot flushes the shard to create a fresh commit and to have respective remote metadata file to be uploaded in remote. Snapshot flow then fetches the latest metadata for the given primary term and generation of the commit. But there are race conditions in which we can not always guarantee remote metadata to be present for the newly created local commit as well. Scenario 1
- Invoke Flush - creates a commit and invokes refresh.
- Refresh of flush uploads segments of new commit and then uploads respective metadata file.
- Acquire last commit.
- More index operations get executed and now another refresh is triggered.
- More segments get uploaded and another metadata file referring to the same primary term and generation gets uploaded.
- Now, request is made to fetch the metadata file for the primary term and generation of the acquired commit.
- Retrieved metadata file won't be referring the segments files of the acquired commit.
Scenario 2
- Invoke Flush - creates a new commit.
- Before the refresh of flush gets triggered, another scheduled refresh or an external refresh uploads segments which are same as present in the new commit.
- New index operations come in.
- Now, the refresh of flush gets triggered which possibly merges some of the segments as well.
- As expected, the metadata file will no longer be referring to the segments of the newly created commit.
Related component
Storage:Durability
To Reproduce
Integ tests can be written exactly by replicating the steps mentioned in the issue above but those won't be deterministic in nature and may require many iterations before issues can start showing up.
Expected behavior
For every new commit, there should be a respective metadata file in remote store which should consist of references to segments exactly same as the segments of new commit.
There are 2 things which can be done to solve this
- Always fetch the oldest metadata file for the given primary term and generation. This will ensure scenario 1 doesn't occur.
- Always upload a metadata file referring the segments of the commit in case a new commit is seen first in refresh listener and then trigger the expected upload of segments of reader.
Additional Details
Plugins Please list all plugins currently enabled.
Screenshots If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
- OS: [e.g. iOS]
- Version [e.g. 22]
Additional context Add any other context about the problem here.