elasticsearch-dsl-py
elasticsearch-dsl-py copied to clipboard
Deep paging
Implements deep paging using search_after and Point in Time (PIT). This can be used to page through all results much more cheaply than using the Elasticsearch scan method. PITs are cheaper to open, so this should be safe for user requests, and can be used as a drop in replacement for scan in many cases.
Closes #1329
While iterating throw the generator, I get such error
pit = search._using.open_point_in_time( AttributeError: 'str' object has no attribute 'open_point_in_time'
Looks like using is set to 'default'
While iterating throw the generator, I get such error
pit = search._using.open_point_in_time( AttributeError: 'str' object has no attribute 'open_point_in_time'
Looks like _using defualts to the string 'default'. I'll update the PR
It is good to merge this PR, since iteration with PIT and search_after is widely uses
I merged in your suggestions.
Yeah, I suppose it would be good to use search_after to page forward and back rather than just scrolling through all results.
What will be interesting as well is to add some sort of pagination with search_after and PIT
The only way I think this would be possible is if we save last_document["sort"]
and use it to create a previous/next context. To get the previous page, you would use the previous last_document["sort"]
; if you want the next page, you pass the current last_document["sort"]
. The only issue with this is that your page context would only last for as long as your PIT is set to expire.
Hi @reese-allison, I did a question on the issue #1329.
Hi! This looks really great, and I was hoping to use it in a project I'm working on. Are there plans to merge this sometime soon? Anything I can do to help push it across the finish line? :)
I've merged main into this pull request and fixed the conflicts. The only CI failure is due to the usage of the walrus operator. It will go away when #1717 is merged, after which we can review this. Thank you!
@pquentin, thanks for getting this updated for me! I haven't looked at it in a while.
I will be looking at this PR along with #806 to try to come up with a general approach to pagination. Thanks.
@reese-allison Thank you so much. Based on your work I have added iterate()
, point_in_time()
and search_after()
methods. The first provides the same functionality as your page()
. The other two are supporting methods that can be used directly for more specific needs beyond pagination. Thanks!