elasticsearch-dsl-py icon indicating copy to clipboard operation
elasticsearch-dsl-py copied to clipboard

Allow for `scan` with aggregations

Open honzakral opened this issue 7 years ago • 12 comments

In 5.0 elasticsearch allows a search request with aggregations when using scan/scroll which we should expose.

This has been moved here from elasticsearch-py - https://github.com/elastic/elasticsearch-py/issues/530

honzakral avatar Feb 07 '17 20:02 honzakral

To be clear - this is not about using a scrolled query to fetch the aggregations bit by bit. This is about having a query with both aggregations and hits, and you want to use a scrolled query for the hits while still seeing the aggregations.

Currently, there's no way to do this: Search.execute() doesn't do scrolled queries, while Search.scan() only return an iterator over the hits with no way to access the aggregation results.

My proposal is to add a new method, Search.execute_scan(), which returns a Response object like Search.execute(), but the Response.hits property, instead of being a static list, is a Search.scan()-style iterator.

macdjord avatar Feb 07 '17 20:02 macdjord

I'd also like to see this happen -- if this isn't a priority right now, do you have a suggested workaround for the time being @HonzaKral ? Perhaps even with the underlying elasticsearch-py package

nguyening avatar Jun 16 '17 00:06 nguyening

I think the proper solution is to create a custom Response class that will hide this - it will provide standard access to the aggregations but when iterating over it's .hits attribute will iterate over all the documents (just like currently iterating over scan() works). Exactly as @macdjord said!

This will make it compatible with the standard response.

honzakral avatar Nov 08 '17 21:11 honzakral

I'm not sure if this should belong here, but the problem I am facing is more of the DSL library being unable to get us more than 10 aggregation results back. Please do correct me if I am wrong, but slicing seems to work only for hits rather than aggregations.

If the "scan" for aggregations can be implemented, I am sure it would be extremely helpful. Meanwhile, for any others who might be facing the problem of only 10 aggregation results in the DSL library, hopefully the workaround here can prove helpful in the meantime.

Perhaps we could also look at allowing pagination for aggregations?

qiujunda avatar Apr 17 '18 13:04 qiujunda

@qiujunda scan with aggregations still doesn't scan through the aggregations, it just runs the aggregations first and then proceeds to scan through the documents.

For most aggregations you can already set the size parameter to get back more than 10 buckets, to paginate through all possible buckets you need to use composite aggregation which is still not supported in elasticsearch-dsl unfortunately - https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-composite-aggregation.html

honzakral avatar Apr 17 '18 16:04 honzakral

@HonzaKral is there any update on being able to return aggregates with the scan response?

mcinnes01 avatar Oct 05 '18 09:10 mcinnes01

This is a pretty big deal tbh. I would like to see this implemented.

Skaldenmet avatar Feb 19 '19 14:02 Skaldenmet

Any update for this functionality to scan through aggregate result?

darshan2203 avatar May 01 '19 01:05 darshan2203

Same here!

iDmple avatar Jul 24 '19 13:07 iDmple

Just to clarify this is not scanning through results of aggregation, just returning an aggregates first and then scanning through the documents.

To "scan" over aggregations you can use the composite aggregation as shown here - https://github.com/elastic/elasticsearch-dsl-py/blob/master/examples/composite_agg.py

honzakral avatar Jul 24 '19 15:07 honzakral

I'm not sure if the problem I'm facing is related to this, but I'm being unable to get the inner_hits from a scan response. Any suggestions would be appreciated.

Regards!

mateoSerna avatar Oct 07 '20 17:10 mateoSerna

Any suggestions how to get the aggregations from the response returned by scan() ?

Kosmonafft avatar Jan 05 '24 16:01 Kosmonafft