OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[Feature Request] Reduce refresh lag for remote store

Open Bukhtawar opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe

With remote store, each refresh call requires primary to upload the local segments to remote and replica then needs to download the same on it's end. This leads to significant delay and adds significant lag on the replica leading to data staleness when queried on replica

Describe the solution you'd like

We can optimise search queries to use an optimistic protocol to send a request on the primary to see if it has the blocks of data needed to serve real time query results. If yes, then the data blocks can be sent over to replica over the wire avoiding an S3 upload/download path and real time queries served from replica. There are caveats with data sync and the amount of data that needs to be copied over based on ingestion volume and refresh rates. The approach needs to be benchmarked and tested for scale before this can be fully realised.

Related component

Storage:Remote

Describe alternatives you've considered

No response

Additional context

No response

Bukhtawar avatar Jan 16 '24 11:01 Bukhtawar

Do you have some idea of what the user experience would be here? Would this be something for a user to opt in to? For example, use this feature if you want replication delays equal to or better than document replication, but do not use this feature if you want to maximize ingest and search throughput.

andrross avatar Jan 16 '24 19:01 andrross

[Storage Triage - attendees 1 2 3 4 5 6 7 8 9 10 11 12]

@Bukhtawar thanks for opening this issue. This looks promising for certain usecases.

We would require deeper investigation and some benchmarks to make a decision here.

linuxpi avatar May 02 '24 15:05 linuxpi