druid icon indicating copy to clipboard operation
druid copied to clipboard

Use in heap memory to accelerate quantile calculation

Open 599166320 opened this issue 2 years ago • 1 comments

Motivation

When I used the Druid histogram extension to calculate the quantile value, I found that the performance was too poor compared with the exact quantile value of Clickhouse.

Through Arthas's monitor tool to monitor the aggregate function of FixedbuckethistogramBufferAggregator, I found that nearly 40% of the overhead is spent on serialization and deserialization. In order to avoid the overhead in this aspect, I try to change the intermediate results from the offheap memory to onheap memory of the JVM to avoid the overhead of serialization and deserialization of intermediate results. The query performance is optimized from more than 10s to more than 3S.

offheap

image

onheap

image

Proposed changes

Add a cache in the FixedbucketshistogramBufferaggregator class to save the intermediate results.

Rationale

Use jvm onheap memory to save temporary results, avoiding unnecessary serialization and deserialization overhead.

599166320 avatar Aug 15 '22 15:08 599166320

How about using APPROX_QUANTILE_DS to calculate quantile?

FrankChen021 avatar Aug 16 '22 02:08 FrankChen021

How about using APPROX_QUANTILE_DS to calculate quantile?

In most scenarios, we also use APPROX_QUANTILE_DS function to calculate the quantile value, and its intermediate results are also stored in the heap memory, with excellent performance.

There are some problems with APPROX_QUANTILE_DS function. For example, the results are always floating around. Many customers cannot understand the principle and often need us to explain.

There is also a maximum limit on the K value of Druid. If the K value is set too high, the performance will also be affected.

Using Druid-histogram to calculate the quantile value also has some advantages, such as: the result is stable and will not float around.

After knowing the range of data in advance, the calculation result is also accurate by setting a reasonable bucket.

599166320 avatar Aug 16 '22 08:08 599166320

Sounds quite interesting. It should be even better to manipulate the offheap memory directly, using methods like buf.getLong, buf.putLong, etc. Then there won't be any heap memory necessary, which eliminates some copying and GC pressure, and reduces risk of OOM. Most of our high-performance sketch aggregators do this. Would you have any interest in taking that approach and contributing it?

Another note: a new approx quantiles algorithm (KLL) is joining soon: #12498. It won't have SQL bindings at first, but we intend to add them. I know you said you aren't a fan of approximate algorithms, but nevertheless you might find this accurate enough for your use case.

gianm avatar Aug 26 '22 01:08 gianm

@gianm I recently modified a version to provide offheap and onheap options, and it was run in the test environment. If the effect is good, I'm happy to share it.

Thank you for recommending KLL. I will experience it.

599166320 avatar Aug 30 '22 17:08 599166320

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Jan 08 '24 00:01 github-actions[bot]

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

github-actions[bot] avatar Feb 06 '24 00:02 github-actions[bot]