elasticsearch-py
elasticsearch-py copied to clipboard
Misleading helpers.scan preserve_order documentation.
I need to 'scan' with 'preserve_order' and it works well. However, documentation warns that setting it to true makes
don’t set the search_type to scan - this will cause the scroll to paginate with preserving the order. Note that this can be an extremely expensive operation and can easily lead to unpredictable results, use with caution.
https://elasticsearch-py.readthedocs.io/en/master/helpers.html#elasticsearch.helpers.scan
That really confused me for a moment.
There is no such a thing like search_type=scan since elasticsearch >2 and in the code there is no trace of pagination.
In addition in most recent documentation of ES there is no trace of "unpredicted results" - It just says that ordering by anything other than _doc makes it slower. https://www.elastic.co/guide/en/elasticsearch/reference/6.4/search-request-scroll.html
Suggested documentation
Don't set sort to _doc. Note this will make this operation slower.
The documentation should note that providing a query with a sort specified and preserve_order=False will clobber the existing sort.
It might be a good idea to check whether query includes a sort when preserve_order is False, and issue a warning if an existing sort is being clobbered. It took a while to realize why my sort specification wasn't being honored.
Closing this old issue. Please reopen if still relevant with recent client software.