metabase-clickhouse-driver icon indicating copy to clipboard operation
metabase-clickhouse-driver copied to clipboard

Incorrect count result when querying a View over a Distributed table

Open dvgorbunkov opened this issue 2 years ago • 11 comments

HI! I'm using metabase version 0.44.1 with version clickhouse driver 0.8.3. Found a problem that a simple query

select count(distinct(orderId)) from AttributedPartnerOrders 
where partnerId = '...'
and createdDate >= '2023-01-01'
and createdDate <= '2023-01-31'
and clickedMinutesAgo < 1440

produces a result 6046, but if you repeat it, it gives 5971.

The numbers are constantly changing.

Through the standard clickhouse-client, the data is given correctly and the same. In the metabase logs we see only regular lines for this request

DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 244.3 ms (6 DB calls) App DB connections: 1/15 Jetty threads: 2/50 (5 idle, 0 queued) (100 total active threads) Queries in flight: 1 (0 queued); clickhouse DB 2 connections: 0/1 (0 threads blocked)

dvgorbunkov avatar Feb 15 '23 13:02 dvgorbunkov

Hi, thanks for the report.

Having the sample cell towers dataset, I tried to do something similar using Metabase 0.45.2 and plugin version 1.0.3:

SELECT count(distinct(area))
FROM `default`.`cell_towers`
WHERE radio = 'CDMA' 
AND updated >= '2011-07-01'
AND updated <= '2011-07-31';

It produces the same result in the Metabase interface and the standard clickhouse-client.

Can you please try upgrading the driver (to at least 0.9.0 as the underlying JDBC driver has been changed there) and then check if the issue persists?

Cheers.

slvrtrn avatar Feb 15 '23 14:02 slvrtrn

@slvrtrn In the previous changelog in the driver, I didn't find a mention of my case, that's why I wrote here about the problem. Thanks for the advice. I'll definitely update and come back here with the result.

dvgorbunkov avatar Feb 15 '23 14:02 dvgorbunkov

Unfortunately the update didn't help. I installed the latest metabase and the latest clickhouse driver (1.0.3). The problem is reproduced.

{
  "browser-info": {
    "language": "ru-RU",
    "platform": "Linux x86_64",
    "userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
    "vendor": "Google Inc."
  },
  "system-info": {
    "file.encoding": "UTF-8",
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "11.0.18+10",
    "java.vendor": "Eclipse Adoptium",
    "java.vendor.url": "https://adoptium.net/",
    "java.version": "11.0.18",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "11.0.18+10",
    "os.name": "Linux",
    "os.version": "5.10.0-13-amd64",
    "user.language": "en",
    "user.timezone": "GMT"
  },
  "metabase-info": {
    "databases": [
      "h2",
      "clickhouse"
    ],
    "hosting-env": "unknown",
    "application-database": "h2",
    "application-database-details": {
      "database": {
        "name": "H2",
        "version": "1.4.197 (2018-03-18)"
      },
      "jdbc-driver": {
        "name": "H2 JDBC Driver",
        "version": "1.4.197 (2018-03-18)"
      }
    },
    "run-mode": "prod",
    "version": {
      "date": "2023-01-27",
      "tag": "v0.45.2.1",
      "branch": "release-x.45.2.x",
      "hash": "1a59de7"
    },
    "settings": {
      "report-timezone": null
    }
  }
}

P.S. This is very strange, but if you make a request like

select 
  count(*) 
from 
  (
    select 
      distinct(orderId) 
    from 
      AttributedPartnerOrders 
    where 
      partnerId = '...' 
      and createdDate >= '2023-01-01' 
      and createdDate <= '2023-01-31' 
      and clickedMinutesAgo < 1440
  )

then the result is correct and does not change if I repeat it again.

dvgorbunkov avatar Feb 15 '23 21:02 dvgorbunkov

Thanks for trying it out.

That is odd indeed!

Could you provide the table structure, maybe with a limited number of fields (the ones used in the query at least), but with exact ClickHouse types?

I will generate some dummy data and try to reproduce it on my end.

slvrtrn avatar Feb 15 '23 22:02 slvrtrn

Thank you for taking such good care of my question.

CREATE VIEW AttributedPartnerOrders
(
    `partnerId` FixedString(24),
    `visitorId` Nullable(FixedString(24)),
    `createdAt` DateTime,
    `createdDate` Date,
    `itemId` Nullable(Int64),
    `orderId` Nullable(String),
    `qnt` Nullable(Float64),
    `amount` Nullable(Float64),
    `clickedAt` DateTime,
    `clickedMinutesAgo` Float64,
    `blockId` FixedString(32),
    `method` Nullable(String),
    `email` Nullable(String),
    `updatedAt` DateTime
) AS
SELECT *
FROM AttributedPartnerOrdersMaterialized

dvgorbunkov avatar Feb 15 '23 23:02 dvgorbunkov

Thanks. Will have a look.

slvrtrn avatar Feb 15 '23 23:02 slvrtrn

I still cannot reproduce this.

Here's what I did:

CREATE TABLE AttributedPartnerOrdersMaterialized
(
    `partnerId`         FixedString(24),
    `createdDate`       Date,
    `orderId`           Nullable(String),
    `clickedMinutesAgo` Float64
) ENGINE MergeTree ORDER BY partnerId;

CREATE VIEW AttributedPartnerOrders
(
    `partnerId`         FixedString(24),
    `createdDate`       Date,
    `orderId`           Nullable(String),
    `clickedMinutesAgo` Float64
)
AS
SELECT *
FROM AttributedPartnerOrdersMaterialized;

so I'd have some minimal schema setup for this test.

I generated 1 million random rows, such as

image

10 unique partner ids, 100k unique orderIds there

running several queries such as

select count(distinct(orderId)) from AttributedPartnerOrders 
where partnerId = 'c359c87522234e459890fd88'
and createdDate >= '2023-01-01'
and createdDate <= '2023-01-31'
and clickedMinutesAgo < 1440

yielded the same result in both Metabase and ClickHouse.

I think we are missing something else here...

To help with the investigation, could you provide more details:

  • what is your ClickHouse server version
  • what is the table engine you are using?
  • is it a single node or a cluster setup?

to speed up the process, maybe you could ping me in the ClickHouse Community Slack

slvrtrn avatar Feb 17 '23 13:02 slvrtrn

I have 4 servers with clickhouse version 22.8.4.7. Two sharded servers, each with one additional server with a replica. I have the following table schema

CREATE TABLE AttributedPartnerOrdersMaterialized_local ON CLUSTER my_cluster (
    `partnerId` FixedString(24),
    `createdDate` Date,
    `itemId` Nullable(Int64),
    `clickedMinutesAgo` Float64,
...
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/AttributedPartnerOrdersMaterialized_local', '{replica}')
PARTITION BY toYYYYMMDD(createdDate) ORDER BY (partnerId, blockId) SETTINGS index_granularity = 8192;

CREATE TABLE AttributedPartnerOrdersMaterialized ON CLUSTER my_cluster (
    `partnerId` FixedString(24),
    `createdDate` Date,
    `itemId` Nullable(Int64),
    `clickedMinutesAgo` Float64,
...
) ENGINE = Distributed('my_cluster', 'mydb', 'AttributedPartnerOrdersMaterialized_local', rand());


CREATE VIEW AttributedPartnerOrders ON CLUSTER my_cluster (
    `partnerId` FixedString(24),
    `createdDate` Date,
    `itemId` Nullable(Int64),
    `clickedMinutesAgo` Float64,
...
) AS SELECT * FROM AttributedPartnerOrdersMaterialized;

As a result, if we make a query on the Distributed table (AttributedPartnerOrdersMaterialized), we get the correct data that does not change if we repeat the query. However, if we make query view table (AttributedPartnerOrders), then we get a random number. This is still strange, but we will be satisfied with the option of transferring queries to another table.

Thanks for pointing out the data model. We have an obvious error.

dvgorbunkov avatar Feb 20 '23 11:02 dvgorbunkov

However, if we make query view table (AttributedPartnerOrders), then we get a random number.

is it the case for clickhouse-client as well or only Metabase?

slvrtrn avatar Feb 20 '23 13:02 slvrtrn

is it the case for clickhouse-client as well or only Metabase?

only Metabase

dvgorbunkov avatar Feb 20 '23 13:02 dvgorbunkov

Glad to hear that it works for you now. I will leave this issue open as we still need to investigate why it behaves awkwardly when using a View over a Distributed table.

slvrtrn avatar Feb 20 '23 15:02 slvrtrn

I'm closing this for now, as it is still unclear what the root cause was and if it's still an issue in the recent driver releases. If there are more details and/or minimal repro available now, feel free to re-open or create a new one.

slvrtrn avatar Jun 11 '24 20:06 slvrtrn