zed icon indicating copy to clipboard operation
zed copied to clipboard

per-row size limits on aggregators

Open mccanne opened this issue 5 years ago • 4 comments

There should be a maxrowsize option on aggregations that also influences the max-read-size buffer in zngio. Maximum row size is more intuitive and is the real bottleneck here. When we implement this in aggregators we should add a comand-line option to zq to go with the maxreadsize flag.

mccanne avatar Dec 14 '20 03:12 mccanne

We should have a general limit on aggregation outputs, perhaps both in terms of "length", i.e., number of contained values as as in terms of bytes occupied by the row, e.g.,

union(query) by id.resp_h with limit=10

which would limit each row to 10 set elements or

union(query) by id.resp_h with limit=5MiB

which would (roughly) limit each row to 5MiB of row memory footprint.

mccanne avatar Dec 15 '20 19:12 mccanne

This supersedes issue #1494.

mccanne avatar Dec 28 '20 16:12 mccanne

Moving to backlog to confirm this is implemented.

mccanne avatar Dec 14 '21 13:12 mccanne

This is a note-to-self because I seem prone to forgetting the history as regards wide rows/values & often wonder if we can do some kind of spill-to-disk rather than requiring limits & knobs.

In a recent chat @mccanne explained that our design currently assumes that a value can fit in memory and hence there has to be some kind of circuit breaker if a user's program tries to make a single value that's larger than would fit in memory, such as is the case with these aggregations. He also explained that this is also how other data warehouses work today. To make our circuit breaker friendly, we've considered wrapping in some kind of error so the user has the benefit of partial results in addition to being made aware that they've exceeded limits.

philrz avatar Aug 17 '22 19:08 philrz