druid
druid copied to clipboard
Coordinator crashes trying to compact a datasource ingested through MSQ with PARTITIONED BY ALL
Please provide a detailed title (e.g. "Broker crashes when using TopN query with Bound filter" instead of just "Broker crashes").
Affected Version
Tested on latest master branch 25.0.0-SNAPSHOT
Description
Ingested a json file with only the following 2 rows through MSQ
% cat lookup_data_small.json
{"k": "stringggggggggggggggggggggggggggggggggggggggggg0", "v": "stringValuegggggggggggggggggggggggggggggggggggggg0"}
{"k": "stringggggggggggggggggggggggggggggggggggggggggg1", "v": "stringValuegggggggggggggggggggggggggggggggggggggg1"}
%
I just kept clicking through the web-console screens, taking whatever defaults were provided. The spec generated produced an insert with a PARTITIONED BY ALL
clause. The interval detected by this is
"intervals": {
"type": "intervals",
"intervals": [
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
]
},
After the table is ingested configure auto-compaction on the table. To do this I simply changed the default config to the following
{
"dataSource": "lookup_data_small",
"granularitySpec": {
"segmentGranularity": "DAY"
},
"tuningConfig": {
"partitionsSpec": {
"type": "dynamic"
}
}
}
The only difference here is the addition of the granulartySpec.
Wait for the next coorindator indexer cycle to run (I just changed by setting druid.coordinator.period.indexingPeriod=PT60s
in coordinator runtime properties file)
This causes the Coordinator to crash and take up a ton of CPU and memory before doing so. I believe the issue is in building the interval to segments map here: https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/server/coordinator/duty/NewestSegmentFirstIterator.java#L121
Thanks for reporting this.
I was looking into this and the issue seems to be in how we are generating the probable time chunks for an interval.
https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/server/coordinator/duty/NewestSegmentFirstIterator.java#L131
In this case, the interval is -infinity,+infinity because of partitioned by ALL
When we set the segment granularity as day in the compaction spec we generate all probable day time chunks which results in a heap issue for the coordinator.
Should we add a check/validation which fails the compaction, when the compaction changes the segment granularity to something other than ALL
for a data source whose segment granularity is ALL
.
Open to ideas for another approach as well.
cc @gianm @paul-rogers @maytasm
@cryptoe, are we doing the probabilistic analysis to limit memory footprint when we we have a large number of segments? if so, we can start by looking at the total count of segments. If the segment is smaller than some number, I'd assume we can use a deterministic analysis. That is, if there is one segment, we don't need to sample: the one segment will fit in memory just fine.
@cryptoe, the other thought is that if we're working out a timeline, there are two ways to proceed. One is to enumerate all possible chunks and map segment into those chunks. The other approach is to sort and group. We should choose the approach that minimizes memory use.
It'd be better to make this work than to make this be an error. The way to do it is to avoid Granularity.getIterable
, which is problematic when using non-ALL granularities on a large interval. There are similar issues on the query side, also caused by this behavior from Granularity.getIterable
: see #12944, #13182, #13206. It would require adjusting some logic "inside-out".
Would it help to change the interval in the compaction spec and use it as a boundary for the generated time chunks? This may not work if the data is spread sparsely across a long time period though
https://github.com/apache/druid/pull/13304 skips the search for compaction when segments of ALL granularity are being compacted to a finer granularity. This prevents the coordinator crash issue.
The compaction itself from ALL to finer granularities was not implemented as Granularity.getIterable would lead to too many partitions and an alternative would be needed for a complete fix