druid
druid copied to clipboard
Query Failures (InvalidRoaringFormat)
When running a query, we're seeing it fail due to an error on some of the historical nodes around the InvalidRoaringFormat
Affected Version
26.0.0
Description
Our historicals are running with the following config:
runtime.properties: | druid.server.tier=hot druid.service=druid/historical/hot druid.plaintextPort=8083 druid.server.http.numThreads=100 druid.processing.buffer.sizeBytes=2000MiB druid.processing.numMergeBuffers=5 druid.processing.numThreads=19 druid.processing.tmpDir=var/druid/processing
And we're seeing the following error on a handful of historicals:
2023-08-07T09:07:31,673 ERROR [processing-35] org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2 - Exception with one of the sequences! org.roaringbitmap.InvalidRoaringFormat: I failed to find one of the right cookies. 75124725 at org.roaringbitmap.buffer.ImmutableRoaringArray.<init>(ImmutableRoaringArray.java:48) ~[RoaringBitmap-0.9.0.jar:?] at org.roaringbitmap.buffer.ImmutableRoaringBitmap.<init>(ImmutableRoaringBitmap.java:1057) ~[RoaringBitmap-0.9.0.jar:?] at org.apache.druid.segment.data.RoaringBitmapSerdeFactory$ImmutableRoaringBitmapObjectStrategy.fromByteBuffer(RoaringBitmapSerdeFactory.java:80) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.data.RoaringBitmapSerdeFactory$ImmutableRoaringBitmapObjectStrategy.fromByteBuffer(RoaringBitmapSerdeFactory.java:64) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.data.GenericIndexed$BufferIndexed.get(GenericIndexed.java:497) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.column.IndexedUtf8ValueSetIndex.getBitmap(IndexedUtf8ValueSetIndex.java:128) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.column.IndexedUtf8ValueSetIndex.access$100(IndexedUtf8ValueSetIndex.java:40) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.column.IndexedUtf8ValueSetIndex$3$1.next(IndexedUtf8ValueSetIndex.java:230) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.column.IndexedUtf8ValueSetIndex$3$1.next(IndexedUtf8ValueSetIndex.java:205) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.collections.bitmap.RoaringBitmapFactory$1$1.next(RoaringBitmapFactory.java:84) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.collections.bitmap.RoaringBitmapFactory$1$1.next(RoaringBitmapFactory.java:68) ~[druid-processing-26.0.0.jar:26.0.0] at org.roaringbitmap.buffer.BufferFastAggregation.naive_or(BufferFastAggregation.java:620) ~[RoaringBitmap-0.9.0.jar:?] at org.roaringbitmap.buffer.BufferFastAggregation.or(BufferFastAggregation.java:713) ~[RoaringBitmap-0.9.0.jar:?] at org.roaringbitmap.buffer.ImmutableRoaringBitmap.or(ImmutableRoaringBitmap.java:875) ~[RoaringBitmap-0.9.0.jar:?] at org.apache.druid.collections.bitmap.RoaringBitmapFactory.union(RoaringBitmapFactory.java:142) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.DefaultBitmapResultFactory.unionDimensionValueBitmaps(DefaultBitmapResultFactory.java:73) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.DefaultBitmapResultFactory.unionDimensionValueBitmaps(DefaultBitmapResultFactory.java:25) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.column.SimpleImmutableBitmapIterableIndex.computeBitmapResult(SimpleImmutableBitmapIterableIndex.java:40) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.filter.AndFilter.getBitmapIndex(AndFilter.java:83) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.FilterAnalysis.analyzeFilter(FilterAnalysis.java:107) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.QueryableIndexCursorSequenceBuilder.buildVectorized(QueryableIndexCursorSequenceBuilder.java:213) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.segment.QueryableIndexStorageAdapter.makeVectorCursor(QueryableIndexStorageAdapter.java:236) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.groupby.epinephelinae.vector.VectorGroupByEngine$1.make(VectorGroupByEngine.java:148) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.groupby.epinephelinae.vector.VectorGroupByEngine$1.make(VectorGroupByEngine.java:144) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:39) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.spec.SpecificSegmentQueryRunner$1.accumulate(SpecificSegmentQueryRunner.java:98) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.spec.SpecificSegmentQueryRunner.doNamed(SpecificSegmentQueryRunner.java:185) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.spec.SpecificSegmentQueryRunner.access$100(SpecificSegmentQueryRunner.java:44) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.spec.SpecificSegmentQueryRunner$2.wrap(SpecificSegmentQueryRunner.java:165) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.CPUTimeMetricQueryRunner$1.wrap(CPUTimeMetricQueryRunner.java:77) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:252) ~[druid-processing-26.0.0.jar:26.0.0] at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:239) ~[druid-processing-26.0.0.jar:26.0.0] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at org.apache.druid.query.PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:251) ~[druid-processing-26.0.0.jar:26.0.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) ~[?:?]
The query itself can be summarised as :
select column1 from (select distinct column1, column2 from table where __time >= CURRENT TIMESTAMP - INTERVAL '3' DAY AND column1 IN ('list_of_values')) GROUP BY column1
Memory/CPU doesn't appear to be an issue. We have tried wiping all historicals and loading the data fresh from deep storage but the error persists.
Not really sure where to go from here...