druid Druid shows null columns even though is not null is used along with other conditions

Affected Version

Version 25.0.0 and 29.0.1

Description

Please see the following screenshots for details.

When we apply both of these conditions: payment_card_type is not null AND payment_card_type <> 'AMEX', rows with payment_card_type = null will still show up.

When we apply both of these conditions: payment_card_type != 'null' AND payment_card_type <> 'AMEX', rows with payment_card_type = null will still show up.

When we apply only this condition: payment_card_type is not null, rows with payment_card_type = null will NOT show up.

query with both payment_card_type is not null and payment_card_type not equal to AMEX

query with only payment_card_type != null

query with only payment_card_type is not null

Sep 19 '24 17:09 carltal

Hi any updates?

Feb 21 '25 18:02 carltal

@Hi @carltal , Do you have any sample data for debugging this issue? I tried to generate some sample data, and ran a few queries on Druid 29.0.1 and I see that the queries return correct values in all cases. Screenshots below:

"merchant_id" = 'M101' AND "__time" >= '2023-01-01' AND "__time" < '2023-12-01' (All records)

"merchant_id" = 'M101' AND "__time" >= '2023-01-01' AND "__time" < '2023-12-01' AND "payment_card_type" is not null AND "payment_card_type" <> 'AMEX'

merchant_id" = 'M101' AND "__time" >= '2023-01-01' AND "__time" < '2023-12-01' AND "payment_card_type" != null AND "payment_card_type" <> 'AMEX'

"merchant_id" = 'M101' AND "__time" >= '2023-01-01' AND "__time" < '2023-12-01' AND ("payment_card_type" != null OR "payment_card_type" <> 'AMEX')

"merchant_id" = 'M101' AND "__time" >= '2023-01-01' AND "__time" < '2023-12-01' AND "payment_card_type" is not null

Feb 21 '25 22:02 ashwintumma23

Hi @ashwintumma23 greetings. May I know what kind of sample data you would like to have?

I just reproduced the issue on Druid 25.0.0. On Druid 29.0.1, I am getting empty instead of null.

Feb 21 '25 22:02 carltal

Hi @carltal, Can you share what is the format of the underlying data? And in terms of sample data, based on the format, can you export a couple of hundred records and upload it to Gist, or Dropbox, or Google Druid? Thanks!

Feb 21 '25 22:02 ashwintumma23

Hi @ashwintumma23, so payment_card_type is just VARCHAR

In terms of sample data, do you need all the rows in that particular datasource? I might not be able to provide them due to compliance reasons (though the data are fake, our risk team may still not OK that).

Also, I am not sure how to export data from Druid?

Thanks.

Feb 21 '25 22:02 carltal

Hi @carltal , We won't need all the rows, just a sample couple of hundred records should suffice, moreover not all columns would also be required and it is fine if the data is synthetically generated. In the interim, can you also try to load the sample data that I have referenced above for my tests in your cluster? and try running the queries to see if you see the same results as my cluster? Thanks!

Feb 21 '25 22:02 ashwintumma23

Hi @ashwintumma23 I see. I wouldn't be able to load that data as merchant_id is a BIGINT in our schema. Also it does not contain the all the columns in my schema.

For the sample data from my end, can you tell me what column is needed? Also is there a way to export that data from Druid UI?

Thanks.

Feb 21 '25 23:02 carltal

Hi @carltal,

I wouldn't be able to load that data as merchant_id is a BIGINT in our schema. Also it does not contain the all the columns in my schema.

Would it be possible to load the data as a new data source itself, and then try to run the queries?

Also is there a way to export that data from Druid UI? Not sure if it would help entirely to debug, but to gain insights on how your data looks like, we can run a SELECT * FROM query, and then clicking on Export results as CSV on the right side.

Example:

Feb 22 '25 08:02 ashwintumma23

@ashwintumma23 Hi, I will check with our risk team and get back to you as soon as possible. Thanks.

Feb 24 '25 19:02 carltal

@ashwintumma23 Hi, I got approval to submit the sample data. It just contains 20+ rows. I hope this helps.

Druid not null debug raw data.csv

Feb 25 '25 17:02 carltal

Hi @carltal , Thanks for sharing the sample data, I did try to load it to my local cluster, and I do see the correct results. Only change I had to do to your csv above, was to remove the quotes " before loading, and explicitly specifying the schema.

Some screenshots for Data Loading

Here's the ingestion spec that was used:

{
  "type": "index_parallel",
  "spec": {
    "ioConfig": {
      "type": "index_parallel",
      "inputSource": {
        "type": "local",
        "baseDir": "sqlIssueDebug",
        "filter": "druid_data.csv"
      },
      "inputFormat": {
        "type": "csv",
        "skipHeaderRows": 1,
        "findColumnsFromHeader": false,
        "columns": [
          "ts",
          "merchant_id",
          "payment_card_type"
        ]
      }
    },
    "tuningConfig": {
      "type": "index_parallel",
      "partitionsSpec": {
        "type": "dynamic"
      }
    },
    "dataSchema": {
      "dataSource": "sample_payment_data_csv_format",
      "timestampSpec": {
        "column": "ts",
        "format": "iso"
      },
      "dimensionsSpec": {
        "dimensions": [
          {
            "type": "long",
            "name": "merchant_id"
          },
          "payment_card_type"
        ]
      },
      "granularitySpec": {
        "queryGranularity": "none",
        "rollup": false,
        "segmentGranularity": "hour"
      }
    }
  }
}

Querying

Shows only the not null and non-AMEX values

We can also have a screen sharing session to discuss further. Can find me on Apache Druid Community Slack.

Feb 26 '25 01:02 ashwintumma23

@ashwintumma23 Hi, thank you for the quick turnaround. I can try what you did in a new datasource and let you know what I see.

Feb 26 '25 16:02 carltal

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

Dec 04 '25 00:12 github-actions[bot]