zed
zed copied to clipboard
Derived analytics
The following text was present in a retired "lake design" document (see #3803). It has been established that this was really a pending to-do, so this issue tracks its ultimate implementation and corresponding updates to docs.
## Derived Analytics
To improve the performance of predictable workloads, many use cases of a
Zed lake pre-compute _derived analytics_ or a particular set of _partial
aggregations_.
For example, the Brim app displays a histogram of event counts grouped by
a category over time. The partial aggregation for such a computation can be
configured to run automatically and store the result in a pool designed to
hold such results. Then, when a scan is run, the Zed analytics engine
recognizes when the DAG of a query can be rewritten to assemble the
partial results instead of deriving the answers from scratch.
When and how such partial aggregations are performed is simply a matter of
writing Zed queries that take the raw data and produce the derived analytics
while conforming to a naming model that allows the Zed lake to recognize
the relationship between the raw data and the derived data.
> TBD: Work out these details which are reminiscent of the analytics cache
> developed in our earlier prototype.