crossfeed icon indicating copy to clipboard operation
crossfeed copied to clipboard

Use scroll API for Elasticsearch export

Open cablej opened this issue 3 years ago • 1 comments

🗣 Description

Crossfeed returns an error when attempting to export more than 10k entries from Elasticsearch. The error returned from Elasticsearch is result window is too large, from + size must be less than or equal to: [10000]. This is because Elasticsearch sets a maximum limit that can be reached to prevent overutilization of memory. From Elastic:

Deep Paging in Distributed Systems To understand why deep paging is problematic, let’s imagine that we are searching within a single index with five primary shards. When we request the first page of results (results 1 to 10), each shard produces its own top 10 results and returns them to the coordinating node, which then sorts all 50 results in order to select the overall top 10.

Now imagine that we ask for page 1,000—​results 10,001 to 10,010. Everything works in the same way except that each shard has to produce its top 10,010 results. The coordinating node then sorts through all 50,050 results and discards 50,040 of them!

You can see that, in a distributed system, the cost of sorting results grows exponentially the deeper we page. There is a good reason that web search engines don’t return more than 1,000 results for any query.

Instead, Elastic recommends using the scroll API, which efficiently allows retrieving many entries. This PR implements the scroll API for the export function, allowing more than 10k results to be efficiently returned.

cablej avatar Jan 08 '22 21:01 cablej

Thanks! Do you know why the elastic docs for the scroll API now says "we no longer recommend using the scroll API for deep pagination" though? https://www.elastic.co/guide/en/elasticsearch/reference/current/scroll-api.html

Hi @cablej ! Just want to make sure this warning doesn't cause any unintended issues when we export >10K entries. Do you think it should be still OK to merge?

epicfaace avatar Apr 15 '22 21:04 epicfaace

No longer relevant. Closing.

actualeyes avatar Jul 28 '23 15:07 actualeyes