druid icon indicating copy to clipboard operation
druid copied to clipboard

The SegmentMetadata query returns the thetaSketch column type incorrectly in real-time ingestion range

Open jamangstangs opened this issue 1 year ago • 2 comments

Environment

  • Apache Druid: 26.0.0
  • Kafka: 2.7.1

Description

Using Kafka ingestion and submitting the ingestion task as follows.

...
    "metricsSpec": [
      {
        "name": "uniq_column1",
        "type": "thetaSketch",
        "fieldName": "uniq_column1",
        "size": 16384
      },
      {
        "name": "uniq_column1",
        "type": "thetaSketch",
        "fieldName": "uniq_column1",
        "size": 16384
      },
    ]
...
    "tuningConfig": {
      "type": "kafka",
      "maxRowsPerSegment": 1000000000,
      "maxTotalRows": 1000000000,
      "maxBytesInMemory": -1
    },
...
    "granularitySpec": {
      "type": "uniform",
      "segmentGranularity": "HOUR",
      "queryGranularity": "SECOND",
      "rollup": true
    }
...
    "taskDuration": "PT1H"

When use segment metadata query, thetaSketch type column return type and typeSignature as STRING type. Not the thetaSketch type.

{
      queryType: "segmentMetadata",
      dataSource: "datasource",
      merge: true
}
column typeSignature type errorMessage
uniq_column1 STRING STRING error:cannot_merge_diff_types: [thetaSketch] and [thetaSketchBuild]
uniq_column2 STRING STRING error:cannot_merge_diff_types: [thetaSketch] and [thetaSketchBuild]

But, when I set the range of the segment metadata query to exclude the real-time ingestion range, it returns the correct type.

{
      queryType: "segmentMetadata",
      dataSource: "datasource",
      merge: true,
      intervals:["2024-08-30T04:00:00.000Z/2024-09-01T23:00:00.000Z"]
}
column typeSignature type errorMessage
uniq_column1 COMPLEX<thetaSketch> thetaSketch null
uniq_column2 COMPLEX<thetaSketch> thetaSketch null

I'm also using version 0.21.0 of the Druid cluster, and when I test the same type of query, it returns the correct type.

{
      queryType: "segmentMetadata",
      dataSource: "datasource",
      merge: true
}
column type errorMessage
uniq_column1 thetaSketch null
uniq_column2 thetaSketch null

It seems particularly unable to merge in the real-time ingestion range for thetaSketch type. This kind of issue already fixed in https://github.com/apache/druid/issues/3339, but still affected in version 26.0.0.

Is there a solution for this, or has it been fixed in a newer version of the Druid cluster?

jamangstangs avatar Sep 01 '24 23:09 jamangstangs

@findingrish Is this something you can take a look into ?

cryptoe avatar Sep 06 '24 03:09 cryptoe

Test with druid 30.0.0, but still have an issue

jamangstangs avatar Oct 21 '24 23:10 jamangstangs

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Jul 29 '25 00:07 github-actions[bot]

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

github-actions[bot] avatar Aug 27 '25 00:08 github-actions[bot]