druid icon indicating copy to clipboard operation
druid copied to clipboard

Coordinator crashes trying to compact a datasource ingested through MSQ with PARTITIONED BY ALL

Open zachjsh opened this issue 2 years ago • 4 comments

Please provide a detailed title (e.g. "Broker crashes when using TopN query with Bound filter" instead of just "Broker crashes").

Affected Version

Tested on latest master branch 25.0.0-SNAPSHOT

Description

Ingested a json file with only the following 2 rows through MSQ

 % cat lookup_data_small.json 
{"k": "stringggggggggggggggggggggggggggggggggggggggggg0", "v": "stringValuegggggggggggggggggggggggggggggggggggggg0"}
{"k": "stringggggggggggggggggggggggggggggggggggggggggg1", "v": "stringValuegggggggggggggggggggggggggggggggggggggg1"}
%

I just kept clicking through the web-console screens, taking whatever defaults were provided. The spec generated produced an insert with a PARTITIONED BY ALL clause. The interval detected by this is

"intervals": {
        "type": "intervals",
        "intervals": [
          "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
        ]
      },

After the table is ingested configure auto-compaction on the table. To do this I simply changed the default config to the following

{
  "dataSource": "lookup_data_small",
  "granularitySpec": {
    "segmentGranularity": "DAY"
  },
  "tuningConfig": {
    "partitionsSpec": {
      "type": "dynamic"
    }
  }
}

The only difference here is the addition of the granulartySpec.

Wait for the next coorindator indexer cycle to run (I just changed by setting druid.coordinator.period.indexingPeriod=PT60s in coordinator runtime properties file)

This causes the Coordinator to crash and take up a ton of CPU and memory before doing so. I believe the issue is in building the interval to segments map here: https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/server/coordinator/duty/NewestSegmentFirstIterator.java#L121

zachjsh avatar Oct 11 '22 05:10 zachjsh

Thanks for reporting this.

I was looking into this and the issue seems to be in how we are generating the probable time chunks for an interval.

https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/server/coordinator/duty/NewestSegmentFirstIterator.java#L131

In this case, the interval is -infinity,+infinity because of partitioned by ALL When we set the segment granularity as day in the compaction spec we generate all probable day time chunks which results in a heap issue for the coordinator.

Should we add a check/validation which fails the compaction, when the compaction changes the segment granularity to something other than ALL for a data source whose segment granularity is ALL.

Open to ideas for another approach as well.

cc @gianm @paul-rogers @maytasm

cryptoe avatar Oct 12 '22 13:10 cryptoe

@cryptoe, are we doing the probabilistic analysis to limit memory footprint when we we have a large number of segments? if so, we can start by looking at the total count of segments. If the segment is smaller than some number, I'd assume we can use a deterministic analysis. That is, if there is one segment, we don't need to sample: the one segment will fit in memory just fine.

paul-rogers avatar Oct 12 '22 16:10 paul-rogers

@cryptoe, the other thought is that if we're working out a timeline, there are two ways to proceed. One is to enumerate all possible chunks and map segment into those chunks. The other approach is to sort and group. We should choose the approach that minimizes memory use.

paul-rogers avatar Oct 12 '22 16:10 paul-rogers

It'd be better to make this work than to make this be an error. The way to do it is to avoid Granularity.getIterable, which is problematic when using non-ALL granularities on a large interval. There are similar issues on the query side, also caused by this behavior from Granularity.getIterable: see #12944, #13182, #13206. It would require adjusting some logic "inside-out".

gianm avatar Oct 12 '22 23:10 gianm

Would it help to change the interval in the compaction spec and use it as a boundary for the generated time chunks? This may not work if the data is spread sparsely across a long time period though

AmatyaAvadhanula avatar Oct 26 '22 04:10 AmatyaAvadhanula

https://github.com/apache/druid/pull/13304 skips the search for compaction when segments of ALL granularity are being compacted to a finer granularity. This prevents the coordinator crash issue.

The compaction itself from ALL to finer granularities was not implemented as Granularity.getIterable would lead to too many partitions and an alternative would be needed for a complete fix

AmatyaAvadhanula avatar Nov 04 '22 02:11 AmatyaAvadhanula