druid icon indicating copy to clipboard operation
druid copied to clipboard

KLL sketch

Open AlexanderSaydakov opened this issue 2 years ago • 4 comments

This is a module to support new KllFloatsSketch and KllDoublesSketch from Apache DataSketches

AlexanderSaydakov avatar May 02 '22 23:05 AlexanderSaydakov

Very cool! Thanks @AlexanderSaydakov!

Has this code been used in a production environment? Any gotches we should know about?

There appears to be a legitimate issue in KllDoublesSketchAggregatorTest.buildingSketchesAtQueryTime when SQL-compatible null handling is enabled: https://app.travis-ci.com/github/apache/druid/jobs/569163704

[ERROR] org.apache.druid.query.aggregation.datasketches.kll.KllFloatsSketchAggregatorTest.buildingSketchesAtQueryTime[groupByConfig = v2, vectorize = true]
[ERROR]   Run 1: KllFloatsSketchAggregatorTest.buildingSketchesAtQueryTime:367 expected:<377> but was:<355>
[ERROR]   Run 2: KllFloatsSketchAggregatorTest.buildingSketchesAtQueryTime:367 expected:<377> but was:<355>
[ERROR]   Run 3: KllFloatsSketchAggregatorTest.buildingSketchesAtQueryTime:367 expected:<377> but was:<355>
[ERROR]   Run 4: KllFloatsSketchAggregatorTest.buildingSketchesAtQueryTime:367 expected:<377> but was:<355>

gianm avatar May 09 '22 16:05 gianm

This code is included in our datasketches-java 3.2.0 release dated 27 Apr 2022. There are no gotches we know about, yet :)

leerho avatar May 09 '22 17:05 leerho

@gianm

Has this code been used in a production environment? Any gotches we should know about?

No, this code has not been used yet. I am not aware of any gotchas. I believe I fixed that error in the test you pointed out.

AlexanderSaydakov avatar May 12 '22 21:05 AlexanderSaydakov

FYI I haven't had a chance to take a look at this but it is in my queue. If anyone gets to review it before I do then thanks!

gianm avatar May 26 '22 05:05 gianm

Two test runs failed with the below segfault in KllDoublesSketchAggregatorTest (first time) and KllFloatsSketchAggregatorTest (second time).

Judging by the fact it's BaseWritableMemoryImpl.getNativeOrderedLong, it could be an out-of-bound access or a use-after-free. I tried running the test case in a loop in my IDE, and unfortunately wasn't able to repro this.

In looking at the code, I suspect it's due to a use-after-free due to returning an object from KllSketchBuildBufferAggregatorHelper.get that references the underlying buffer. That's used to implement BufferAggregator.get and VectorAggregator.get, which by contract aren't permitted to return objects that reference the underlying buffer. (It may be freed, or reused, while those get'ed objects still exist.)

Similar code in other sketch helpers creates on-heap copies (hll uses copy(), quantiles uses compact()). Could we do the same thing here?

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f69d737e924, pid=21102, tid=21103
#
# JRE version: OpenJDK Runtime Environment (11.0.2+9) (build 11.0.2+9)
# Java VM: OpenJDK 64-Bit Server VM (11.0.2+9, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# J 14439 c1 org.apache.datasketches.memory.internal.BaseWritableMemoryImpl.getNativeOrderedLong(J)J (38 bytes) @ 0x00007f69d737e924 [0x00007f69d737e7a0+0x0000000000000184]
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/travis/build/apache/druid/extensions-core/datasketches/hs_err_pid21102.log
Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

gianm avatar Aug 24 '22 01:08 gianm

So close! There's a conflict in ColumnType, probably from https://github.com/apache/druid/pull/12914, which was a response to the discussion in this PR about better supporting the arrays returned by the KLL postaggs. I took the liberty of resolving the conflict using GitHub's "resolve conflict" feature.

gianm avatar Aug 25 '22 16:08 gianm

Hmm. Another crash.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f193747fdac, pid=21257, tid=0x00007f194d6bd700
#
# JRE version: OpenJDK Runtime Environment (8.0_292-b10) (build 1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10)
# Java VM: OpenJDK 64-Bit Server VM (25.292-b10 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# J 14078 C1 org.apache.datasketches.memory.internal.BaseWritableMemoryImpl.getNativeOrderedLong(J)J (38 bytes) @ 0x00007f193747fdac [0x00007f193747fc20+0x18c]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/travis/build/apache/druid/extensions-core/datasketches/hs_err_pid21257.log
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

gianm avatar Aug 25 '22 22:08 gianm

I suspect the additional crash was due to a similar problem, but for unions. I pushed https://github.com/apache/druid/pull/12498/commits/7e117c72c86373027b39b701dd965d164716af52 to the branch which will copy those too. @AlexanderSaydakov if there's a better way to do this, please let me know (or just change it).

gianm avatar Aug 26 '22 00:08 gianm

I suspect the additional crash was due to a similar problem, but for unions. I pushed 7e117c7 to the branch which will copy those too. @AlexanderSaydakov if there's a better way to do this, please let me know (or just change it).

Looks fine to me. Sorry, I missed that part.

AlexanderSaydakov avatar Aug 26 '22 02:08 AlexanderSaydakov

Looks like the latest build passed tests. I'll merge it. Thanks @AlexanderSaydakov !

gianm avatar Aug 27 '22 04:08 gianm