flux icon indicating copy to clipboard operation
flux copied to clipboard

newWindowGroupKey produces a lot of memory allocations

Open jsternberg opened this issue 6 years ago • 8 comments

The newWindowGroupKey method inside of the window() transformation produces a lot of memory allocations when there are a lot of groups. This is because it will end up producing N*M group keys with M being the interval and N being the number of tables that it is processing.

In addition, these allocations are untracked which can cause an OOM when not enough memory is present on the system.

This issue is about optimizing newWindowGroupKey so it uses fewer allocations. One potential method might be to refactor group key so it is more friendly to memory allocations as proposed in #1032.

Alternatively, we can find a way to track group key creation in the memory allocator. Or we can do both. These two may be compatible since tracking memory allocations probably involves converting group key to use arrow buffers in some manner and the current interface wouldn't be friendly to doing that either.

jsternberg avatar May 06 '19 17:05 jsternberg

This can be fixed by implementing the proposal in #1032

nathanielc avatar Jul 08 '19 16:07 nathanielc

Another way that this can be fixed, as I found out recently, is to optimize window() a bit. The window() transformation will recompute the group key for each new point it looks at. It is likely possible to optimize this so it only generates the group key once as long as it is changed to read the data with the time already sorted. This should be a pretty simple check and is also something that could get set by the planner.

jsternberg avatar Nov 01 '19 14:11 jsternberg

We're also seeing poor performances with Flux when using aggregateWindow: the Flux version is about 30 times slower than the equivalent InfluxQL query with GROUP BY(time) (InfluxDB 1.8.2): 17s for Flux vs 0.5s for InfluxQL.

Is there a timeframe for a fix? We're very happy with Influx 1.8, but we'd like to use Flux :)

metoule avatar Sep 09 '20 15:09 metoule

Some Examples that i believe show this issue affecting us on v1.8.3:

Flux - 43s: image

image

InfluxQL: ~1.5s

image

Just to eliminate the count() component the same query but without the interval, just returning the total: ~3s

image

I'd like to make use of the advanced capabilities of Flux but any type of query that i want to window over time i take a massive performance hit on.

FortDigital avatar Jan 07 '21 13:01 FortDigital

Hello,

I have been testing out using flux for a project of mine but seam to be running into this issue. Any aggregated queries (min, max, sum, count, etc.) take forever compared to InfluxQL.

Wondering if there is a timeline for a fix for this? Really excited to use flux's advanced capabilities but it's not worth it if it's performing so slowly.

Thanks

webstersteele avatar Feb 28 '21 15:02 webstersteele

@webstersteele sorry for the late response on this. I often miss github notifications and just saw this comment. We're currently experimenting with improving this in this PR. There are also push down optimizations depending on the query that will push down the windowed aggregation which avoids this section of code. Those push downs primarily exist on the cloud version of the InfluxData product but we're actively working on bringing those to the open source versions and backporting them to the 1.x version.

jsternberg avatar May 06 '21 14:05 jsternberg

@jsternberg Hi, have there been any development/progress in this area for the OSS version ?

cTn-dev avatar Oct 11 '21 11:10 cTn-dev

Still having massive performance issues on 2022. Not happening with Influx 1.8. Looking for alternatives right now.

alvarolb avatar Jun 14 '22 10:06 alvarolb

This issue has had no recent activity and will be closed soon.

github-actions[bot] avatar Aug 09 '24 01:08 github-actions[bot]