metabase-clickhouse-driver icon indicating copy to clipboard operation
metabase-clickhouse-driver copied to clipboard

FieldFilter variables mapped to DateTime datatype don't show results correctly

Open anaklime opened this issue 1 year ago • 12 comments

Describe the bug

Hello. I didn't knew how to lable it - bug or suggestion, but IMHO this looks like a bug.

I've discovered, that Field Filter doesn't work "correctly" if being mapped to DateTime datatype. All Filter Widget types for dates are affected, except one - Single Date type.

Steps to reproduce

We are checking result of the following query for Date Filter Filter Widget type:

select date_time
from date_filter
[[where {{date_time}}]]
order by date_time desc
Filter option Where clause Actual Result Expected Result
Today testing.date_filter.date_time = toDate('2024-03-19') We got data only for midnight time: 19/3/2024, 00:00(check first screenshot [^1]) Should show all timestamps for 19/3/2024
Yesterday testing.date_filter.date_time = toDate('2024-03-18') We got data only for midnight time: 18/3/2024, 00:00(check first screenshot [^2]) Should show all timestamps for 18/3/2024
Last 7 days testing.date_filter.date_time BETWEEN toDate('2024-03-12') AND toDate('2024-03-18') For 18/3/2024 shows only midnight time: 18/3/2024, 00:00(check first screenshot [^3]) Should show all timestamps between 11/3/2024 till 18/3/2024 including timestamps from 18/3/2024
Specific dates -> Between without time testing.date_filter.date_time BETWEEN toDate('2024-03-11') AND toDate('2024-03-12') For 12/3/2024 shows only midnight time: 12/3/2024, 00:00(check first screenshot [^4]) Should show all timestamps for 12/3/2024
Specific dates -> Between with time testing.date_filter.date_time BETWEEN '2024-03-11 00:00:00.000' AND '2024-03-12 23:59:00.000' It didn't showed record for 2024-03-12 where time was 23:59:59(check first screenshot [^5]) Should show all timestamps for 2024-03-12 23:59
Specific dates -> After testing.date_filter.date_time > toDate('2024-03-18') We're missing midnight for 2024-03-18, and show other records for that day(screenshot [^6]) Shouldn't show any timestamps for 2024-03-18
Specific dates -> On CAST(testing.date_filter.date_time AS date) = toDate('2024-03-12') As expected - we got all records for 2024-03-12 (check screenshot[^7]) All timestamps for 2024-03-12 had been shown

Following table will be all affected with same issue as above, so i'll just mention

Filter type (& option) Where clause
Month and Year testing.date_filter.date_time BETWEEN toDate('2023-05-01') AND toDate('2023-05-31')
Quater and Year where testing.date_filter.date_time BETWEEN toDate('2023-07-01') AND toDate('2023-09-30')
Data Range without time testing.date_filter.date_time BETWEEN toDate('2024-03-18') AND toDate('2024-03-19')
Data Range with time testing.date_filter.date_time BETWEEN '2024-03-18 00:00:00.000' AND '2024-03-19 23:59:00.000'
Data Range single date testing.date_filter.date_time = toDate('2024-03-18')
Relative date Basically the same as for Date Filter options

And here is table for Single Date Filter Widget type:

Filter option Where clause Result
Just date CAST(testing.date_filter.date_time AS date) = toDate('2024-03-19') As we wanted - all records for 2024-03-19 are shown by this query (screenshot [^8])
Date with time toStartOfMinute(testing.date_filter.date_time) = '2024-03-19 23:59:00.000' It showed every record for 23:59, as we expected(screenshot [^9])

These works perfectly with DateTime datatype, and negate all "issues" mentioned above. Can we somehow implement same "methods" for other Filter Widgets?

Configuration

Environment

  • metabase-clickhouse-driver version: 1.4.0
  • metabase-clickhouse-driver configuration: default
  • Metabase version: 0.49.0
  • OS: Oracle Linux Server 8.6

ClickHouse server

  • ClickHouse Server version: 23.8.9.54 (official build)
  • CREATE TABLE statements for tables involved:

CREATE TABLE date_filter ( date_time DateTime('Europe/Moscow'), ) ENGINE = TinyLog

This is example of DateTime records from the table - 5 records with different time for a day:

2023-03-25 00:00:00 2023-03-25 06:30:34 2023-03-25 12:00:00 2023-03-25 18:45:58 2023-03-25 23:59:59 2023-03-26 00:00:00

Where max value for date_time is 2024-03-19 23:59:59 and min value for date_time is 2023-03-25 00:00:00

Screenshots: [^1]:image [^2]:image [^3]:image [^4]:image [^5]:image [^6]:image [^7]:image [^8]:image [^9]:image

anaklime avatar Mar 19 '24 14:03 anaklime

Environment metabase-clickhouse-driver version: metabase-clickhouse-driver configuration: Metabase version: OS:

Could you share these? What is the driver version? OS? What is the timezone configuration (OS, CH, JVM/Metabase)?

slvrtrn avatar Mar 19 '24 14:03 slvrtrn

Yeah, i forgot to put this information there, sorry. Accidentally ctrl-enter before finished whole report .-.

anaklime avatar Mar 19 '24 14:03 anaklime

No worries. Thank you for the detailed report (emoji navigation is the next level). In the latest driver (which requires MB 0.49.x), there was a fix related to this feature; see 1.4.0.

slvrtrn avatar Mar 19 '24 14:03 slvrtrn

Added environment info. TZ configuration is the same for all of OS, CH and Metabase - UTC +3

So yeah 1.4.0 is working perfectly fine, i was wondering is there any chance to implement this to other Filter Widgets?

anaklime avatar Mar 19 '24 14:03 anaklime

I found that it could be made like this, but i have no idea how it will affect performance:

toStartOfMinute(date_time) BETWEEN '2024-03-17 23:59:00' AND '2024-03-19 23:59:00'
CAST(date_time as date) BETWEEN toDate('2024-03-18') AND toDate('2024-03-19')

anaklime avatar Mar 19 '24 15:03 anaklime

OK, I see that 1.4.0 is affected. I will have a look when I am back from vacation next week.

slvrtrn avatar Mar 20 '24 14:03 slvrtrn

@slvrtrn Just double checking on this one, hit the same issue as reported, where the filters basically exclude the start/end days of the intervals (so for example a last 7 days relative range filter will return the data only for 5 days).

ecyshor avatar Apr 01 '24 19:04 ecyshor

I tried to reproduce it with MB 0.49.2, driver version 1.4.0, ClickHouse 24.3, and this dataset (6 rows per day, with different times: 00:00, 00:59, 01:00, 06:30, 18:45, 23:59:59; i.e. expected 42 rows per week or 7 days). I also tried both UTC and Europe/Moscow timezones in ClickHouse + MB + DateTime column.

The results were correct with all filter variations in both time zones. I am pretty positive that this was fixed in https://github.com/ClickHouse/metabase-clickhouse-driver/pull/218 (similar report - https://github.com/ClickHouse/metabase-clickhouse-driver/issues/216). But if I am missing something, please let me know; I will double-check.

Here are my results:

(Single date) April 1st - all records for this date.

Screenshot from 2024-04-02 08-15-59

(Relative date) Past 7 days - as expected, 42 dates.

Screenshot from 2024-04-02 08-15-39

(Date Range with time) April 1st 12:30pm - April 2nd 12:30pm - filtered correctly.

Screenshot from 2024-04-02 08-15-13

(Date Range default) April 1st 00:00 - April 2nd 23:59

Screenshot from 2024-04-02 08-14-34

(Date filter) This week - 42 dates as expected.

Screenshot from 2024-04-02 08-13-54

(Date filter) Today (2nd of April) - only 6 dates, correct results.

Screenshot from 2024-04-02 08-13-44

(Date filter) Previous week - 42 dates as expected, March 25th - March 31st

Screenshot from 2024-04-02 08-13-34

(Date filter) Previous 7 days - 42 dates as expected, March 25th - March 31st

Screenshot from 2024-04-02 08-11-58

Regarding time zones setup: CH had either <timezone>UTC</timezone> or <timezone>Europe/Moscow</timezone> in config.xml; MB was running in Docker with either 'JAVA_TIMEZONE': 'UTC' or 'JAVA_TIMEZONE': 'Europe/Moscow', and the column was defined as either DateTime('UTC') or DateTime('Europe/Moscow').

slvrtrn avatar Apr 02 '24 06:04 slvrtrn

As i said earlier - it seems that #218 fixed some particular cases, but not the issue itself. I updated my MB to 0.49.2 version: image

driver version still 1.4.0:

[root@test ch_mt_driver]# cat metabase-plugin.yaml 
info:
  name: Metabase ClickHouse Driver
  version: 1.4.0

ClickHouse to 24.3.2 version:

[root@test clickhouse-server]# clickhouse-server --version
ClickHouse server version 24.3.2.23 (official build).

The issue itself that many filters using toDate instead of Cast, which leads to strange results.( see examples below) Generally speaking - toDate('some_date') = some_dateT00:00 .

I used same dataset for this.

Here are all examples for all filters:

Filter and Widget Type Time Period Actual Result Expected Result Comment
Field Filter: Month and Year March 2024 Well, this query won't show any data beside with midnight timestamp for last day of the month [^1] But if we use Cast it will show us all data we need [^2]
Field Filter: Quarter and Year Q1 2024 Same result as previous example - no data for last day of the quarter [^3] And the same solution - using Case will solve this issue [^2] It is pointless to make a new separate screenshot for this as queries are identical
Field Filter: Single Date 8 April Here we can see that, if we use Case there is no issue with showing all timestamps for one day [^4] We got what we expected No additional comments here
Field Filter: Date Range 8 April - 10 April Well, this is default Date Range values here - it's using toDate without Cast so we're missing some data for 10/04/2024 [^5] Same range with Case will get us missed timestamps [^6] I actually don't get it why we have different result for this one :<
Filed Filter: Date Range with time 8 April 00:00 - 10 April 18:45 Because in this clause absolute values used without any functions we are getting pretty much what we've expected [^7] <--- It could be considered as an issue, but i actually don't know if it should be - any timestamps after 18:45 won't be shown, for example 18:45:00.001. This is expected because we used absolute values with BETWEEN clause.
Field Filter: Relative Date Today No Cast in clause - no expected result. Nothing, besides midnight is shown [^8] We're expecting same result as in Single Date widget [^4] No additional comments here
Field Filter: Relative Date Yesterday Same result as in previous example [^9] Using Cast will fix this [^4] No additional comments here
Field Filter: Relative Date Past 7 days Same result as earlier with combination of BETWEEN and toDate without Cast [^10] As always Cast makes anything better [^11] I don't understand - same tests, same versions - different results ¯_(ツ)_/¯
Field Filter: Date Filter Today Same result as for Relative Date widget [^12] Using Cast will fix this [^4] No additional comments here
Field Filter: Date Filter Last Week Again usage toDate without Cast [^13] We need Cast to fix this [^6] No additional comments here
Field Filter: Date Filter This Week Same result as Last Week widget [^14] We need Cast to fix this [^6] No additional comments here
Field Filter: Date Filter Previous 7 Days Same result as earlier [^15] We need Cast to fix this [^6] No additional comments here

It's very strange that we have different results, despite using identical versions of MD, Driver and CH. Also, i'm a little bit lost in this - as far as i understood how widget creating clauses was full on driver side. Am i wrong about that?

I'll try full clean install later to be sure that i missed nothing while updating to newer versions.

And again apologies for my poor English - it's not my native language. I hope you could understand what i was writing about)

Screenshots: [^1]:image [^2]:image [^3]:image [^4]:image [^5]:image [^6]:image [^7]:image [^8]:image [^9]:image [^10]:image [^11]:image [^12]:image [^13]:image [^14]:image [^15]:image

anaklime avatar Apr 08 '24 16:04 anaklime

Thanks for the detailed answer :)

how widget creating clauses was full on driver side. Am i wrong about that?

This is correct. This is mainly handled by Metabase itself via the common code; the only tweak for CH is this.

Considering CAST calls, we had this issue: https://github.com/ClickHouse/metabase-clickhouse-driver/issues/196, which is why these were removed and that bit in the QP sources was introduced.

Regarding the statements: If you run the queries in clickhouse-client (from CLI), are the results the same as those shown in Metabase?

I suspect that this could be due to the disabled TZ switch (Related: #200, see the discussion); however, this shouldn't be the case since everything is in the same TZ on your configuration.

Lastly, could you contact me in the community Slack? I think it might be easier/faster to debug that way.

slvrtrn avatar Apr 09 '24 08:04 slvrtrn

So i was wondering if it's possible to "map" Cast(date_time as date) to toDate function, so it will appear only if toDate is used in these filters.

Yeah i saw that some issues had been mentioned as fixed in the changelog for 1.3.0 version. The #196 issue is relatable to this issue. But issue there had been caused by Cast used when it shouldn't be - with absolute values we don't need to use any functions (at least i believe so =D ).

If i run queries in CLI i have the same result.

Also, i perform a clean install of MB, MB driver and CH - latest versions for all mentioned, and still got same results

I don't think that this could be related to TZ issue, even if TZ would differ in my config - i believe TZ issue would show results which differ from expected for + or - 3 hours. ( If only +/-3 UTC is the only difference between configured timezones )

Unfortunately, Slack is restricted in my country and i can't join this community. Is it possible to communicate via different method/messenger? Telegram, perhaps?

anaklime avatar Apr 09 '24 12:04 anaklime

Unfortunately, Slack is restricted in my country and i can't join this community. Is it possible to communicate via different method/messenger? Telegram, perhaps?

Feel free to contact me there. Same username.

slvrtrn avatar Apr 10 '24 12:04 slvrtrn

This should be resolved as of 1.5.1+. Please feel free to re-open or create a new one if the issue occurs again.

slvrtrn avatar Jun 11 '24 20:06 slvrtrn