elasticsearch-dsl-py icon indicating copy to clipboard operation
elasticsearch-dsl-py copied to clipboard

Deep paging

Open reese-allison opened this issue 2 years ago • 9 comments

Implements deep paging using search_after and Point in Time (PIT). This can be used to page through all results much more cheaply than using the Elasticsearch scan method. PITs are cheaper to open, so this should be safe for user requests, and can be used as a drop in replacement for scan in many cases.

Closes #1329

reese-allison avatar Sep 24 '22 14:09 reese-allison

While iterating throw the generator, I get such error

pit = search._using.open_point_in_time( AttributeError: 'str' object has no attribute 'open_point_in_time'

Looks like using is set to 'default'

While iterating throw the generator, I get such error

pit = search._using.open_point_in_time( AttributeError: 'str' object has no attribute 'open_point_in_time'

Looks like _using defualts to the string 'default'. I'll update the PR

reese-allison avatar Oct 22 '22 20:10 reese-allison

It is good to merge this PR, since iteration with PIT and search_after is widely uses

I merged in your suggestions.

reese-allison avatar Oct 22 '22 23:10 reese-allison

Yeah, I suppose it would be good to use search_after to page forward and back rather than just scrolling through all results.

reese-allison avatar Oct 22 '22 23:10 reese-allison

What will be interesting as well is to add some sort of pagination with search_after and PIT

The only way I think this would be possible is if we save last_document["sort"] and use it to create a previous/next context. To get the previous page, you would use the previous last_document["sort"]; if you want the next page, you pass the current last_document["sort"]. The only issue with this is that your page context would only last for as long as your PIT is set to expire.

reese-allison avatar Nov 23 '22 20:11 reese-allison

Hi @reese-allison, I did a question on the issue #1329.

lucasvc avatar Jan 23 '23 07:01 lucasvc

Hi! This looks really great, and I was hoping to use it in a project I'm working on. Are there plans to merge this sometime soon? Anything I can do to help push it across the finish line? :)

Sbacon017 avatar Mar 19 '24 15:03 Sbacon017

I've merged main into this pull request and fixed the conflicts. The only CI failure is due to the usage of the walrus operator. It will go away when #1717 is merged, after which we can review this. Thank you!

pquentin avatar Mar 20 '24 06:03 pquentin

@pquentin, thanks for getting this updated for me! I haven't looked at it in a while.

reese-allison avatar Mar 20 '24 16:03 reese-allison

I will be looking at this PR along with #806 to try to come up with a general approach to pagination. Thanks.

miguelgrinberg avatar Mar 25 '24 12:03 miguelgrinberg

@reese-allison Thank you so much. Based on your work I have added iterate(), point_in_time() and search_after() methods. The first provides the same functionality as your page(). The other two are supporting methods that can be used directly for more specific needs beyond pagination. Thanks!

miguelgrinberg avatar May 30 '24 17:05 miguelgrinberg