elasticsearch
elasticsearch copied to clipboard
ESQL: INLINESTATS followup
Description
https://github.com/elastic/elasticsearch/pull/109583 will add support for INLINESTATS, a command to run a STATS and then merge the results into the stream of results. This issue tracks follow up work:
Before GA
- [x] Run logical plan optimization before splitting the phases. See https://github.com/elastic/elasticsearch/pull/109583#discussion_r1683084852
- [x] Allow functions in the grouping position (+union types test with conversion in the grouping position
// TODO once inlinestats supports expressions in groups we'll likely need the same sort of extraction here) - [ ] Address left over comments on https://github.com/elastic/elasticsearch/pull/111690
- [ ] Fix union fields in INLINESTATS - they are allowed now but they produce "fun" error messages and don't work
- [x] Add all phases to the result of
profile - [ ] Allow push down for conditions coming from the EVAL based join. So
| INLINESTATS a=AVG(foo) | WHERE foo > ashould be able to push thefoo > abit in the second phase. It can't now. - [ ] Track intermediate request memory usage and lift the 1mb limit.
- [ ] Keep readers open for the second round.
- [ ] Fix the test labeled
brokenWhy-Ignore,byConstant-Ignored - [ ] More shadowing tests
- [ ] Fix
INLINESTATS x=MAX(a), x=MIN(a)-shadowingInternal-Ignored - [ ] Fix
shadowingSelfBySelf - [ ] Fix
INLINESTATSin CCS - [ ] Once we have the above - we should look into pushing the
Phasedstuff further into physical planning. It'd be nice to, for example, and aSubqueryExecplan that runs likePhaseddoes here. Not sure if physical or logical - but physical feels better. We're doing logical now though. - [ ] Test with the
BUCKETfunction. Sounds like it doesn't work at the moment. - [ ] More than one INLINESTATS. see. Note that this might have to do with multiple lookups - that's tracked in https://github.com/elastic/elasticsearch/issues/109353
Evantually
- [ ] Some https://github.com/elastic/elasticsearch/issues/109353
Pinging @elastic/es-analytical-engine (Team:Analytics)
Some of https://github.com/elastic/elasticsearch/issues/110923 need to happen before GA of INLINESTATS. Some need to happen after. Some are entirely unrelated.
Maybe we should also consider an optimization, where the output columns of INLINESTATS are actually unused (e.g. DROPped) - then we don't need to perform INLINESTATS/a 2-phase query at all.
Seeing a couple of different failures - i see variants of these with LOOKUP as well, so that might be related. I've noticed that I sometimes can work around them by adding | LIMIT 1000000 right after the LOOKUP or INLINESTATS. also explicitly KEEPing fields sometimes resolves the issue.
FROM .entities*instance*,.alerts*,.slos*
| EVAL _entity_id_type_hosts = CASE(host.name IS NOT NULL, ":hosts", NULL)
| EVAL _entity_id_type_host = host.name
| INLINESTATS _unique_alerts_type_hosts = COUNT_DISTINCT(kibana.alert.uuid) BY _entity_id_type_hosts
| INLINESTATS _unique_alerts_type_host = COUNT_DISTINCT(kibana.alert.uuid) BY _entity_id_type_host
| STATS _alerts_count_hosts = SUM(_unique_alerts_type_hosts) BY entity.id
results in:
"class org.elasticsearch.compute.data.LongArrayBlock cannot be cast to class org.elasticsearch.compute.data.BytesRefBlock (org.elasticsearch.compute.data.LongArrayBlock and org.elasticsearch.compute.data.BytesRefBlock are in unnamed module of loader java.net.FactoryURLClassLoader @43120a77)
{
"error": {
"root_cause": [
{
"type": "class_cast_exception",
"reason": "class org.elasticsearch.compute.data.LongArrayBlock cannot be cast to class org.elasticsearch.compute.data.BytesRefBlock (org.elasticsearch.compute.data.LongArrayBlock and org.elasticsearch.compute.data.BytesRefBlock are in unnamed module of loader java.net.FactoryURLClassLoader @43120a77)"
}
],
"type": "class_cast_exception",
"reason": "class org.elasticsearch.compute.data.LongArrayBlock cannot be cast to class org.elasticsearch.compute.data.BytesRefBlock (org.elasticsearch.compute.data.LongArrayBlock and org.elasticsearch.compute.data.BytesRefBlock are in unnamed module of loader java.net.FactoryURLClassLoader @43120a77)",
"suppressed": [
{
"type": "exception",
"reason": "1 further exceptions were dropped"
},
{
"type": "task_cancelled_exception",
"reason": "cancelled on failure"
}
]
},
"status": 500
}
Adding a KEEP results in a different error:
FROM .entities*instance*,.alerts*,.slos*
| KEEP host.name, service.name, kibana.alert.uuid, entity.id
| EVAL _entity_id_type_hosts = CASE(host.name IS NOT NULL, ":hosts", NULL)
| EVAL _entity_id_type_host = host.name
| INLINESTATS _unique_alerts_type_hosts = COUNT_DISTINCT(kibana.alert.uuid) BY _entity_id_type_hosts
| INLINESTATS _unique_alerts_type_host = COUNT_DISTINCT(kibana.alert.uuid) BY _entity_id_type_host
| STATS _alerts_count_hosts = SUM(_unique_alerts_type_hosts) BY entity.id
Index 10 out of bounds for length 8
{
"error": {
"root_cause": [
{
"type": "array_index_out_of_bounds_exception",
"reason": "Index 10 out of bounds for length 8"
}
],
"type": "array_index_out_of_bounds_exception",
"reason": "Index 10 out of bounds for length 8",
"suppressed": [
{
"type": "exception",
"reason": "2 further exceptions were dropped"
},
{
"type": "task_cancelled_exception",
"reason": "cancelled on failure"
},
{
"type": "task_cancelled_exception",
"reason": "parent task was cancelled [cancelled on failure]",
"suppressed": [
{
"type": "task_cancelled_exception",
"reason": "parent task was cancelled [cancelled on failure]"
},
{
"type": "exception",
"reason": "1 further exceptions were dropped"
},
{
"type": "task_cancelled_exception",
"reason": "parent task was cancelled [cancelled on failure]"
}
]
}
]
},
"status": 500
}
Heya, all work items from this meta issue have been extracted into GH issues. I'd like to close this meta issue in favor of internal track keeping if that's okay.
@alex-spies/Team, do we know when INLINESTATS will be available in GA please? As one of the large enterprise customers, this one is blocking lot of migration into ESQL and currently team is slowly moving away into adhoc tools to do workaround; as it's been too long after INLINESTATS "availability soon" https://www.elastic.co/search-labs/blog/esql-piped-query-language-goes-ga has been posted.
Hi @getkub apologies for the delay, I understand this can be challenging. It actually serves as an example of why we are not permitted to state when a feature will be released, because plans can change. At the time of that blog we did think INLINE STATS was coming out soon. We made a shift of priorities to Joins, and have now released the Lookup Join in 8.18 (and Lookup Join will be GA in the near term). Unfortunately the join work actually broke INLINE STATS which set it back further.
Rest assured we recognize the need for INLINE STATS and are once again actively working on it so it can be released as a Tech Preview in the short term.
@tylerperk . Thank you for the update. These kind of updates help us in prioritsing the work and push back to other stakeholders. We have huge pressure from developers to move away to other toolsets in AWS with lack of update on core ESQL functionalties like INLINESTATS, reindex functionality, joins to be done in other software.