kibana [Lens] optimize duplicate formula functions

Summary

Part of https://github.com/elastic/kibana/issues/135265

Optimizes semantically-duplicate functions of the following types

average
count
last_value
max
median
min
standard_deviation
sum
unique_count

Left to do

percentile rank
filtered percentiles

This PR also improves how we update order-by references from terms aggregations for percentile operations. Instead of anticipating the optimization in the toEsAggsFn method on the terms operation class, we update the order-bys as part of the AST transformations themselves in the optimizeEsAggs method on the percentile operation class.

Testing

Create a visualization and add a dimension with this formula

median(bytes) + median(bytes) +

sum(bytes) + sum(bytes) +

max(hour_of_day) + max(hour_of_day) +

average(bytes) + average(bytes) +

standard_deviation(bytes) + standard_deviation(bytes) +

min(machine.ram, shift='1h') + min(machine.ram, shift='1h') +

min(machine.ram, shift='2h') + min(machine.ram, shift='2h')

Check the Elasticsearch request in the inspector. There should only be 7 aggregations, not 14.

Then try this formula

sum(bytes) + sum(bytes) +

sum(bytes, kql='geo.dest: "GA" ') + sum(bytes, kql='geo.dest: "GA" ') +

sum(bytes, kql='geo.dest: "AL" ') + sum(bytes, kql='geo.dest: "AL" ') 

+ sum(bytes, lucene='geo.dest: "AL" ') + 
sum(bytes, lucene='geo.dest: "AL" ') + 

sum(bytes, kql='geo.dest: "AL" ', reducedTimeRange='1m') + sum(bytes, kql='geo.dest: "AL" ', reducedTimeRange='1m')

Checklist

Delete any items that are not applicable to this PR.

[ ] Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
[ ] Documentation was added for features that require explanation or tutorials
[ ] Unit or functional tests were updated or added to match the most common scenarios
[ ] Any UI touched in this PR is usable by keyboard only (learn more about keyboard accessibility)
[ ] Any UI touched in this PR does not create any new axe failures (run axe in browser: FF, Chrome)
[ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
[ ] This renders correctly on smaller devices using a responsive layout. (You can test this in your browser)
[ ] This was checked for cross-browser compatibility

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk	Probability	Severity	Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space.	Low	High	Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces.
Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks.	High	Low	Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure.
Code should gracefully handle cases when feature X or plugin Y are disabled.	Medium	High	Unit tests will verify that any feature flag or plugin combination still results in our service operational.
See more potential risk examples

For maintainers

[ ] This was checked for breaking API changes and was labeled appropriately

Sep 15 '22 19:09 drewdaemon

@elasticmachine merge upstream

Sep 19 '22 20:09 drewdaemon

@flash1293 Thinking more about how much flexibility to give the operation classes for optimizing...

All the simple “dedupe” optimizations are turning out to follow the same set of steps

group duplicates
remove all but one agg from each group
update the idMap to map the single agg to all the original columns
update terms agg order-by references

The only step here that is really specific to an operation type is deciding which aggs are duplicate.

So, we could extend the operation class with a method called something like getGroupByKey(agg) and have the datasource's to_expression take care of the rest. We could leave the optimizeEsAggs method in place for more complicated optimization scenarios such as we do with the percentiles. That way, it’s a lot cheaper to take advantage of the most common optimization, but there’s still flexibility.

Any thoughts?

Sep 21 '22 14:09 drewdaemon

@andrewctate This makes sense to me

Sep 21 '22 14:09 flash1293

@elasticmachine merge upstream

Sep 22 '22 00:09 drewdaemon

Pinging @elastic/kibana-vis-editors @elastic/kibana-vis-editors-external (Team:VisEditors)

Sep 22 '22 02:09 elasticmachine

@elasticmachine merge upstream

Sep 23 '22 18:09 drewdaemon

@kibanamachine merge upstream

Sep 26 '22 13:09 drewdaemon

:yellow_heart: Build succeeded, but was flaky

Buildkite Build
Commit: a8e560da1bee523cb0af29b2ef46517342e3ee4e

Failed CI Steps

Rules, Alerts and Exceptions ResponseOps Cypress Tests on Security Solution

Test Failures

[job] [logs] Rules, Alerts and Exceptions ResponseOps Cypress Tests on Security Solution / Alerts detection rules table auto-refresh should disable auto refresh when any rule selected and enable it after rules unselected

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`lens`	906	908	+2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`data`	2506	2508	+2

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`lens`	1.2MB	1.2MB	+2.2KB

Unknown metric groups

API count

id	before	after	diff
`data`	3209	3211	+2

ESLint disabled line counts

id	before	after	diff
`lens`	25	26	+1

Total ESLint disabled count

id	before	after	diff
`lens`	28	29	+1

History

:broken_heart: Build #75534 failed 845822faea46d777472da2b10a7da090bc00f258
:yellow_heart: Build #75407 was flaky 5337c5320d09d7b1d53e301982bb6dc674af7836
:broken_heart: Build #75381 failed 5304126f913d65991064c9c58825291ee6a27034
:broken_heart: Build #75348 failed 68cc1719247f66ad0409a149bea79f293cc541ae
:green_heart: Build #74678 succeeded 38e0ce378ab3479c9d7ead9187c72fc97d9adcfb
:broken_heart: Build #74663 failed c0104251b9ef4863ca34edbebd5294442ebb9354

To update your PR or re-run it, just comment with: @elasticmachine merge upstream

Sep 26 '22 15:09 kibana-ci

kibana kibana copied to clipboard

[Lens] optimize duplicate formula functions

Summary

Testing

Checklist

Risk Matrix

For maintainers

:yellow_heart: Build succeeded, but was flaky

Failed CI Steps

Test Failures

Metrics [docs]

Module Count

Public APIs missing comments

Async chunks

API count

ESLint disabled line counts

Total ESLint disabled count

History

kibana
kibana copied to clipboard