metabase-clickhouse-driver
metabase-clickhouse-driver copied to clipboard
FieldFilter variables mapped to DateTime datatype don't show results correctly
Describe the bug
Hello. I didn't knew how to lable it - bug or suggestion, but IMHO this looks like a bug.
I've discovered, that Field Filter doesn't work "correctly" if being mapped to DateTime datatype. All Filter Widget types for dates are affected, except one - Single Date type.
Steps to reproduce
We are checking result of the following query for Date Filter Filter Widget type:
select date_time
from date_filter
[[where {{date_time}}]]
order by date_time desc
| Filter option | Where clause | Actual Result | Expected Result |
|---|---|---|---|
| Today | testing.date_filter.date_time = toDate('2024-03-19') | We got data only for midnight time: 19/3/2024, 00:00(check first screenshot [^1]) | Should show all timestamps for 19/3/2024 |
| Yesterday | testing.date_filter.date_time = toDate('2024-03-18') |
We got data only for midnight time: 18/3/2024, 00:00(check first screenshot [^2]) | Should show all timestamps for 18/3/2024 |
| Last 7 days | testing.date_filter.date_time BETWEEN toDate('2024-03-12') AND toDate('2024-03-18') |
For 18/3/2024 shows only midnight time: 18/3/2024, 00:00(check first screenshot [^3]) | Should show all timestamps between 11/3/2024 till 18/3/2024 including timestamps from 18/3/2024 |
| Specific dates -> Between without time | testing.date_filter.date_time BETWEEN toDate('2024-03-11') AND toDate('2024-03-12') |
For 12/3/2024 shows only midnight time: 12/3/2024, 00:00(check first screenshot [^4]) | Should show all timestamps for 12/3/2024 |
| Specific dates -> Between with time | testing.date_filter.date_time BETWEEN '2024-03-11 00:00:00.000' AND '2024-03-12 23:59:00.000' |
It didn't showed record for 2024-03-12 where time was 23:59:59(check first screenshot [^5]) | Should show all timestamps for 2024-03-12 23:59 |
| Specific dates -> After | testing.date_filter.date_time > toDate('2024-03-18') |
We're missing midnight for 2024-03-18, and show other records for that day(screenshot [^6]) | Shouldn't show any timestamps for 2024-03-18 |
| Specific dates -> On | CAST(testing.date_filter.date_time AS date) = toDate('2024-03-12') |
As expected - we got all records for 2024-03-12 (check screenshot[^7]) | All timestamps for 2024-03-12 had been shown |
Following table will be all affected with same issue as above, so i'll just mention
| Filter type (& option) | Where clause |
|---|---|
Month and Year |
testing.date_filter.date_time BETWEEN toDate('2023-05-01') AND toDate('2023-05-31') |
Quater and Year |
where testing.date_filter.date_time BETWEEN toDate('2023-07-01') AND toDate('2023-09-30') |
Data Range without time |
testing.date_filter.date_time BETWEEN toDate('2024-03-18') AND toDate('2024-03-19') |
Data Range with time |
testing.date_filter.date_time BETWEEN '2024-03-18 00:00:00.000' AND '2024-03-19 23:59:00.000' |
Data Range single date |
testing.date_filter.date_time = toDate('2024-03-18') |
| Relative date | Basically the same as for Date Filter options |
And here is table for Single Date Filter Widget type:
| Filter option | Where clause | Result |
|---|---|---|
| Just date | CAST(testing.date_filter.date_time AS date) = toDate('2024-03-19') |
As we wanted - all records for 2024-03-19 are shown by this query (screenshot [^8]) |
| Date with time | toStartOfMinute(testing.date_filter.date_time) = '2024-03-19 23:59:00.000' |
It showed every record for 23:59, as we expected(screenshot [^9]) |
These works perfectly with DateTime datatype, and negate all "issues" mentioned above.
Can we somehow implement same "methods" for other Filter Widgets?
Configuration
Environment
- metabase-clickhouse-driver version: 1.4.0
- metabase-clickhouse-driver configuration: default
- Metabase version: 0.49.0
- OS: Oracle Linux Server 8.6
ClickHouse server
- ClickHouse Server version: 23.8.9.54 (official build)
CREATE TABLEstatements for tables involved:
CREATE TABLE date_filter
(
date_time DateTime('Europe/Moscow'),
)
ENGINE = TinyLog
This is example of DateTime records from the table - 5 records with different time for a day:
2023-03-25 00:00:00 2023-03-25 06:30:34 2023-03-25 12:00:00 2023-03-25 18:45:58 2023-03-25 23:59:59 2023-03-26 00:00:00
Where max value for date_time is 2024-03-19 23:59:59 and min value for date_time is 2023-03-25 00:00:00
Screenshots:
[^1]:
[^2]:
[^3]:
[^4]:
[^5]:
[^6]:
[^7]:
[^8]:
[^9]:
Environment metabase-clickhouse-driver version: metabase-clickhouse-driver configuration: Metabase version: OS:
Could you share these? What is the driver version? OS? What is the timezone configuration (OS, CH, JVM/Metabase)?
Yeah, i forgot to put this information there, sorry. Accidentally ctrl-enter before finished whole report .-.
No worries. Thank you for the detailed report (emoji navigation is the next level). In the latest driver (which requires MB 0.49.x), there was a fix related to this feature; see 1.4.0.
Added environment info. TZ configuration is the same for all of OS, CH and Metabase - UTC +3
So yeah 1.4.0 is working perfectly fine, i was wondering is there any chance to implement this to other Filter Widgets?
I found that it could be made like this, but i have no idea how it will affect performance:
toStartOfMinute(date_time) BETWEEN '2024-03-17 23:59:00' AND '2024-03-19 23:59:00'
CAST(date_time as date) BETWEEN toDate('2024-03-18') AND toDate('2024-03-19')
OK, I see that 1.4.0 is affected. I will have a look when I am back from vacation next week.
@slvrtrn Just double checking on this one, hit the same issue as reported, where the filters basically exclude the start/end days of the intervals (so for example a last 7 days relative range filter will return the data only for 5 days).
I tried to reproduce it with MB 0.49.2, driver version 1.4.0, ClickHouse 24.3, and this dataset (6 rows per day, with different times: 00:00, 00:59, 01:00, 06:30, 18:45, 23:59:59; i.e. expected 42 rows per week or 7 days). I also tried both UTC and Europe/Moscow timezones in ClickHouse + MB + DateTime column.
The results were correct with all filter variations in both time zones. I am pretty positive that this was fixed in https://github.com/ClickHouse/metabase-clickhouse-driver/pull/218 (similar report - https://github.com/ClickHouse/metabase-clickhouse-driver/issues/216). But if I am missing something, please let me know; I will double-check.
Here are my results:
(Single date) April 1st - all records for this date.
(Relative date) Past 7 days - as expected, 42 dates.
(Date Range with time) April 1st 12:30pm - April 2nd 12:30pm - filtered correctly.
(Date Range default) April 1st 00:00 - April 2nd 23:59
(Date filter) This week - 42 dates as expected.
(Date filter) Today (2nd of April) - only 6 dates, correct results.
(Date filter) Previous week - 42 dates as expected, March 25th - March 31st
(Date filter) Previous 7 days - 42 dates as expected, March 25th - March 31st
Regarding time zones setup: CH had either <timezone>UTC</timezone> or <timezone>Europe/Moscow</timezone> in config.xml; MB was running in Docker with either 'JAVA_TIMEZONE': 'UTC' or 'JAVA_TIMEZONE': 'Europe/Moscow', and the column was defined as either DateTime('UTC') or DateTime('Europe/Moscow').
As i said earlier - it seems that #218 fixed some particular cases, but not the issue itself.
I updated my
MB to 0.49.2 version:
driver version still 1.4.0:
[root@test ch_mt_driver]# cat metabase-plugin.yaml
info:
name: Metabase ClickHouse Driver
version: 1.4.0
ClickHouse to 24.3.2 version:
[root@test clickhouse-server]# clickhouse-server --version
ClickHouse server version 24.3.2.23 (official build).
The issue itself that many filters using toDate instead of Cast, which leads to strange results.( see examples below)
Generally speaking - toDate('some_date') = some_dateT00:00 .
I used same dataset for this.
Here are all examples for all filters:
| Filter and Widget Type | Time Period | Actual Result | Expected Result | Comment | |
|---|---|---|---|---|---|
| ✗ | Field Filter: Month and Year | March 2024 | Well, this query won't show any data beside with midnight timestamp for last day of the month [^1] | But if we use Cast it will show us all data we need [^2] |
|
| ✗ | Field Filter: Quarter and Year | Q1 2024 | Same result as previous example - no data for last day of the quarter [^3] | And the same solution - using Case will solve this issue [^2] |
It is pointless to make a new separate screenshot for this as queries are identical |
| ✓ | Field Filter: Single Date | 8 April | Here we can see that, if we use Case there is no issue with showing all timestamps for one day [^4] |
We got what we expected | No additional comments here |
| ✗ | Field Filter: Date Range | 8 April - 10 April | Well, this is default Date Range values here - it's using toDate without Cast so we're missing some data for 10/04/2024 [^5] |
Same range with Case will get us missed timestamps [^6] |
I actually don't get it why we have different result for this one :< |
| ✓ | Filed Filter: Date Range with time | 8 April 00:00 - 10 April 18:45 | Because in this clause absolute values used without any functions we are getting pretty much what we've expected [^7] | <--- | It could be considered as an issue, but i actually don't know if it should be - any timestamps after 18:45 won't be shown, for example 18:45:00.001. This is expected because we used absolute values with BETWEEN clause. |
| ✗ | Field Filter: Relative Date | Today | No Cast in clause - no expected result. Nothing, besides midnight is shown [^8] |
We're expecting same result as in Single Date widget [^4] |
No additional comments here |
| ✗ | Field Filter: Relative Date | Yesterday | Same result as in previous example [^9] | Using Cast will fix this [^4] |
No additional comments here |
| ✗ | Field Filter: Relative Date | Past 7 days | Same result as earlier with combination of BETWEEN and toDate without Cast [^10] |
As always Cast makes anything better [^11] |
I don't understand - same tests, same versions - different results ¯_(ツ)_/¯ |
| ✗ | Field Filter: Date Filter | Today | Same result as for Relative Date widget [^12] |
Using Cast will fix this [^4] |
No additional comments here |
| ✗ | Field Filter: Date Filter | Last Week | Again usage toDate without Cast [^13] |
We need Cast to fix this [^6] |
No additional comments here |
| ✗ | Field Filter: Date Filter | This Week | Same result as Last Week widget [^14] |
We need Cast to fix this [^6] |
No additional comments here |
| ✗ | Field Filter: Date Filter | Previous 7 Days | Same result as earlier [^15] | We need Cast to fix this [^6] |
No additional comments here |
It's very strange that we have different results, despite using identical versions of MD, Driver and CH. Also, i'm a little bit lost in this - as far as i understood how widget creating clauses was full on driver side. Am i wrong about that?
I'll try full clean install later to be sure that i missed nothing while updating to newer versions.
And again apologies for my poor English - it's not my native language. I hope you could understand what i was writing about)
Screenshots:
[^1]:
[^2]:
[^3]:
[^4]:
[^5]:
[^6]:
[^7]:
[^8]:
[^9]:
[^10]:
[^11]:
[^12]:
[^13]:
[^14]:
[^15]:
Thanks for the detailed answer :)
how widget creating clauses was full on driver side. Am i wrong about that?
This is correct. This is mainly handled by Metabase itself via the common code; the only tweak for CH is this.
Considering CAST calls, we had this issue: https://github.com/ClickHouse/metabase-clickhouse-driver/issues/196, which is why these were removed and that bit in the QP sources was introduced.
Regarding the statements: If you run the queries in clickhouse-client (from CLI), are the results the same as those shown in Metabase?
I suspect that this could be due to the disabled TZ switch (Related: #200, see the discussion); however, this shouldn't be the case since everything is in the same TZ on your configuration.
Lastly, could you contact me in the community Slack? I think it might be easier/faster to debug that way.
So i was wondering if it's possible to "map" Cast(date_time as date) to toDate function, so it will appear only if toDate is used in these filters.
Yeah i saw that some issues had been mentioned as fixed in the changelog for 1.3.0 version.
The #196 issue is relatable to this issue. But issue there had been caused by Cast used when it shouldn't be - with absolute values we don't need to use any functions (at least i believe so =D ).
If i run queries in CLI i have the same result.
Also, i perform a clean install of MB, MB driver and CH - latest versions for all mentioned, and still got same results
I don't think that this could be related to TZ issue, even if TZ would differ in my config - i believe TZ issue would show results which differ from expected for + or - 3 hours. ( If only +/-3 UTC is the only difference between configured timezones )
Unfortunately, Slack is restricted in my country and i can't join this community. Is it possible to communicate via different method/messenger? Telegram, perhaps?
Unfortunately, Slack is restricted in my country and i can't join this community. Is it possible to communicate via different method/messenger? Telegram, perhaps?
Feel free to contact me there. Same username.
This should be resolved as of 1.5.1+. Please feel free to re-open or create a new one if the issue occurs again.