druid icon indicating copy to clipboard operation
druid copied to clipboard

Query Failures (InvalidRoaringFormat)

Open forzamehlano opened this issue 1 year ago • 2 comments

When running a query, we're seeing it fail due to an error on some of the historical nodes around the InvalidRoaringFormat

Affected Version

26.0.0

Description

Our historicals are running with the following config:

runtime.properties: | druid.server.tier=hot druid.service=druid/historical/hot druid.plaintextPort=8083 druid.server.http.numThreads=100 druid.processing.buffer.sizeBytes=2000MiB druid.processing.numMergeBuffers=5 druid.processing.numThreads=19 druid.processing.tmpDir=var/druid/processing

And we're seeing the following error on a handful of historicals:

2023-08-07T09:07:31,673 ERROR [processing-35] org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2 - Exception with one of the sequences! org.roaringbitmap.InvalidRoaringFormat: I failed to find one of the right cookies. 75124725 at org.roaringbitmap.buffer.ImmutableRoaringArray.<init>(ImmutableRoaringArray.java:48) ~[RoaringBitmap-0.9.0.jar:?] at org.roaringbitmap.buffer.ImmutableRoaringBitmap.<init>(ImmutableRoaringBitmap.java:1057) ~[RoaringBitmap-0.9.0.jar:?] at org.apache.druid.segment.data.RoaringBitmapSerdeFactory$ImmutableRoaringBitmapObjectStrategy.fromByteBuffer(RoaringBitmapSerdeFactory.java:80) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.data.RoaringBitmapSerdeFactory$ImmutableRoaringBitmapObjectStrategy.fromByteBuffer(RoaringBitmapSerdeFactory.java:64) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.data.GenericIndexed$BufferIndexed.get(GenericIndexed.java:497) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.column.IndexedUtf8ValueSetIndex.getBitmap(IndexedUtf8ValueSetIndex.java:128) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.column.IndexedUtf8ValueSetIndex.access$100(IndexedUtf8ValueSetIndex.java:40) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.column.IndexedUtf8ValueSetIndex$3$1.next(IndexedUtf8ValueSetIndex.java:230) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.column.IndexedUtf8ValueSetIndex$3$1.next(IndexedUtf8ValueSetIndex.java:205) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.collections.bitmap.RoaringBitmapFactory$1$1.next(RoaringBitmapFactory.java:84) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.collections.bitmap.RoaringBitmapFactory$1$1.next(RoaringBitmapFactory.java:68) ~[druid-processing-26.0.0.jar:26.0.0] at org.roaringbitmap.buffer.BufferFastAggregation.naive_or(BufferFastAggregation.java:620) ~[RoaringBitmap-0.9.0.jar:?] at org.roaringbitmap.buffer.BufferFastAggregation.or(BufferFastAggregation.java:713) ~[RoaringBitmap-0.9.0.jar:?] at org.roaringbitmap.buffer.ImmutableRoaringBitmap.or(ImmutableRoaringBitmap.java:875) ~[RoaringBitmap-0.9.0.jar:?] at org.apache.druid.collections.bitmap.RoaringBitmapFactory.union(RoaringBitmapFactory.java:142) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.DefaultBitmapResultFactory.unionDimensionValueBitmaps(DefaultBitmapResultFactory.java:73) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.DefaultBitmapResultFactory.unionDimensionValueBitmaps(DefaultBitmapResultFactory.java:25) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.column.SimpleImmutableBitmapIterableIndex.computeBitmapResult(SimpleImmutableBitmapIterableIndex.java:40) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.filter.AndFilter.getBitmapIndex(AndFilter.java:83) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.FilterAnalysis.analyzeFilter(FilterAnalysis.java:107) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.QueryableIndexCursorSequenceBuilder.buildVectorized(QueryableIndexCursorSequenceBuilder.java:213) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.QueryableIndexStorageAdapter.makeVectorCursor(QueryableIndexStorageAdapter.java:236) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.groupby.epinephelinae.vector.VectorGroupByEngine$1.make(VectorGroupByEngine.java:148) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.groupby.epinephelinae.vector.VectorGroupByEngine$1.make(VectorGroupByEngine.java:144) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:39) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.spec.SpecificSegmentQueryRunner$1.accumulate(SpecificSegmentQueryRunner.java:98) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.spec.SpecificSegmentQueryRunner.doNamed(SpecificSegmentQueryRunner.java:185) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.spec.SpecificSegmentQueryRunner.access$100(SpecificSegmentQueryRunner.java:44) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.spec.SpecificSegmentQueryRunner$2.wrap(SpecificSegmentQueryRunner.java:165) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.CPUTimeMetricQueryRunner$1.wrap(CPUTimeMetricQueryRunner.java:77) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:252) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:239) ~[druid-processing-26.0.0.jar:26.0.0] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at org.apache.druid.query.PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:251) ~[druid-processing-26.0.0.jar:26.0.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) ~[?:?]

The query itself can be summarised as : select column1 from (select distinct column1, column2 from table where __time >= CURRENT TIMESTAMP - INTERVAL '3' DAY AND column1 IN ('list_of_values')) GROUP BY column1

Memory/CPU doesn't appear to be an issue. We have tried wiping all historicals and loading the data fresh from deep storage but the error persists.

Not really sure where to go from here...

forzamehlano avatar Aug 07 '23 10:08 forzamehlano