Rolling windows for clickhouse
Is your feature request related to a problem? Please describe. I want to use rolling windows in clickhouse but I cannot get them to work when applying a granularity. Looking at the tests running for clickhouse I guess this is not supported?
Describe the solution you'd like Rolling windows work with granularity applied.
Describe alternatives you've considered At least document that it is not supported so people don't lose time. Perhaps include an alternative.
Additional context Ref. https://github.com/cube-js/cube/blob/eab3ce906e1b0af1551057b567d01d96b635ed01/packages/cubejs-schema-compiler/test/integration/clickhouse/custom-granularities.test.ts#L329
I think this support is now there, ref. https://clickhouse.com/docs/sql-reference/statements/select/join#join-with-inequality-conditions-for-columns-from-different-tables
For this you need join_use_nulls=0, which cube does when clickhouse is readonly
@mjvankampen Does it not work for you only with any granularity or only with custom granularities? It would be great to see the data model + query that failed for you.
@igorlukanin For example using the example from the docs with the query used there (I did add a date range) I get the following error using a standard granularity:
using value does work, but typically not very useful.
cubes:
- name: orders
sql: >
SELECT 1 AS id, '2025-01-01'::TIMESTAMP AS time UNION ALL
SELECT 2 AS id, '2025-01-11'::TIMESTAMP AS time UNION ALL
SELECT 3 AS id, '2025-01-21'::TIMESTAMP AS time UNION ALL
SELECT 4 AS id, '2025-01-31'::TIMESTAMP AS time UNION ALL
SELECT 5 AS id, '2025-02-01'::TIMESTAMP AS time UNION ALL
SELECT 6 AS id, '2025-02-11'::TIMESTAMP AS time UNION ALL
SELECT 7 AS id, '2025-02-21'::TIMESTAMP AS time UNION ALL
SELECT 8 AS id, '2025-03-01'::TIMESTAMP AS time UNION ALL
SELECT 9 AS id, '2025-03-11'::TIMESTAMP AS time UNION ALL
SELECT 10 AS id, '2025-03-21'::TIMESTAMP AS time UNION ALL
SELECT 11 AS id, '2025-03-31'::TIMESTAMP AS time UNION ALL
SELECT 12 AS id, '2025-04-01'::TIMESTAMP AS time
dimensions:
- name: time
sql: time
type: time
measures:
- name: rolling_count_month
sql: id
type: count
rolling_window:
trailing: unbounded
I confirm it's a specific issue between clickhouse and cube. I created a CH table :
And I have the following error when reproducing the test :
And the same definition as in the example :
I even tried to add the allow_experimental_join_condition in the cube sql definition, without success :
Here is the sql generated by cube :
SELECT q_0.period_to_date__time_month, period_to_date__revenue_ytd period_to_date__revenue_ytdFROM ( SELECTperiod_to_date.time_series.date_from period_to_date__time_month, sum(period_to_date__revenue_ytd) period_to_date__revenue_ytdFROM ( SELECT parseDateTimeBestEffort( arrayJoin( ['2024-07-01T00:00:00.000','2024-08-01T00:00:00.000','2024-09-01T00:00:00.000','2024-10-01T00:00:00.000','2024-11-01T00:00:00.000','2024-12-01T00:00:00.000','2025-01-01T00:00:00.000','2025-02-01T00:00:00.000','2025-03-01T00:00:00.000','2025-04-01T00:00:00.000','2025-05-01T00:00:00.000','2025-06-01T00:00:00.000','2025-07-01T00:00:00.000'] ) ) as date_from, parseDateTimeBestEffort( arrayJoin( ['2024-07-31T23:59:59.999','2024-08-31T23:59:59.999','2024-09-30T23:59:59.999','2024-10-31T23:59:59.999','2024-11-30T23:59:59.999','2024-12-31T23:59:59.999','2025-01-31T23:59:59.999','2025-02-28T23:59:59.999','2025-03-31T23:59:59.999','2025-04-30T23:59:59.999','2025-05-31T23:59:59.999','2025-06-30T23:59:59.999','2025-07-31T23:59:59.999'] ) ) as date_to ) ASperiod_to_date.time_seriesLEFT JOIN ( SELECT toDateTime64( toStartOfMonth( toTimeZone( toDateTime64(period_to_date_revenue_ytd_cumulative__period_to_date.time, 0 ), 'UTC' ), 'UTC' ), 0, 'UTC' ) period_to_date__time_month, sum( period_to_date_revenue_ytd_cumulative__period_to_date.revenue ) period_to_date__revenue_ytdFROM ( SELECT * from t12.test SETTINGS allow_experimental_join_condition = 1 ) ASperiod_to_date_revenue_ytd_cumulative__period_to_dateWHERE ( toTimeZone( toDateTime64(period_to_date_revenue_ytd_cumulative__period_to_date.time, 0 ), 'UTC' ) >= toDateTime64( toStartOfYear(parseDateTimeBestEffort(?), 'UTC'), 0, 'UTC' ) AND toTimeZone( toDateTime64( period_to_date_revenue_ytd_cumulative__period_to_date.time, 0 ), 'UTC' ) <= parseDateTimeBestEffort(?) ) GROUP BY period_to_date__time_month) ASperiod_to_date_revenue_ytd_cumulative__baseONperiod_to_date_revenue_ytd_cumulative__base.period_to_date__time_month >= toDateTime64( toStartOfYear(period_to_date.time_series.date_from, 'UTC'), 0, 'UTC' ) AND period_to_date_revenue_ytd_cumulative__base.period_to_date__time_month<=period_to_date.time_series.date_toGROUP BYperiod_to_date__time_month) as q_0 ORDER BYperiod_to_date__time_month ASC LIMIT 10000
We tried to use a more recent version of clickhouse to have experimental_join accepted by default : ClickHouse server version 25.5.2.47 (official build).
We have now the same issue as mjvankampen
After rewriting the query sent by cube to clickhouse we found that it may be caused by the left join (https://github.com/ClickHouse/ClickHouse/issues/76317). Once we replace it with an inner join it works (but the result can be obviously false).
Here is the original query :
SELECT q_0.period_to_date__time_month, period_to_date__revenue_ytd period_to_date__revenue_ytdFROM (SELECTperiod_to_date.time_series.date_from period_to_date__time_month, Sum(period_to_date__revenue_ytd) period_to_date__revenue_ytdFROM (SELECT parseDateTimeBestEffort(arrayJoin( ['2024-01-01t00:00:00.000','2024-02-01t00:00:00.000','2024-03-01t00:00:00.000','2024-04-01t00:00:00.000','2024-05-01t00:00:00.000','2024-06-01t00:00:00.000','2024-07-01t00:00:00.000','2024-08-01t00:00:00.000','2024-09-01t00:00:00.000','2024-10-01t00:00:00.000','2024-11-01t00:00:00.000','2024-12-01t00:00:00.000','2025-01-01t00:00:00.000','2025-02-01t00:00:00.000','2025-03-01t00:00:00.000','2025-04-01t00:00:00.000','2025-05-01t00:00:00.000','2025-06-01t00:00:00.000','2025-07-01t00:00:00.000'] )) AS date_from, parseDateTimeBestEffort(arrayJoin( ['2024-01-31t23:59:59.999','2024-02-29t23:59:59.999','2024-03-31t23:59:59.999','2024-04-30t23:59:59.999','2024-05-31t23:59:59.999','2024-06-30t23:59:59.999','2024-07-31t23:59:59.999','2024-08-31t23:59:59.999','2024-09-30t23:59:59.999','2024-10-31t23:59:59.999','2024-11-30t23:59:59.999','2024-12-31t23:59:59.999','2025-01-31t23:59:59.999','2025-02-28t23:59:59.999','2025-03-31t23:59:59.999','2025-04-30t23:59:59.999','2025-05-31t23:59:59.999','2025-06-30t23:59:59.999','2025-07-31t23:59:59.999'] )) AS date_to) ASperiod_to_date.time_seriesLEFT JOIN (SELECT toDateTime64(toStartOfMonth(toTimezone(toDateTime64(period_to_date_revenue_ytd_cumulative__period_to_date.time, 0 ) , 'UTC'), 'UTC'), 0, 'UTC') period_to_date__time_month, Sum(period_to_date_revenue_ytd_cumulative__period_to_date.revenue) period_to_date__revenue_ytdFROM default.test ASperiod_to_date_revenue_ytd_cumulative__period_to_dateWHERE ( toTimezone( toDateTime64(period_to_date_revenue_ytd_cumulative__period_to_date.time, 0), 'UTC') >= toDateTime64(toStartOfYear(parseDateTimeBestEffort ('2024-01-01T00:00:00.000'), 'UTC' ), 0, 'UTC') AND toTimezone(toDateTime64( period_to_date_revenue_ytd_cumulative__period_to_date.time, 0), 'UTC') <= parseDateTimeBestEffort('2025-07-31T23:59:59.999') ) GROUP BY period_to_date__time_month) AS period_to_date_revenue_ytd_cumulative__baseONperiod_to_date_revenue_ytd_cumulative__base.period_to_date__time_month >= toDateTime64(toStartOfYear(period_to_date.time_series.date_from, 'UTC'), 0, 'UTC' ) AND period_to_date_revenue_ytd_cumulative__base.period_to_date__time_month<=period_to_date.time_series.date_toGROUP BYperiod_to_date__time_month) AS q_0 ORDER BY period_to_date__time_monthASC LIMIT 10000
And the updated query :
SELECT q_0.period_to_date__time_month, period_to_date__revenue_ytd period_to_date__revenue_ytdFROM (SELECTperiod_to_date.time_series.date_from period_to_date__time_month, Sum(period_to_date__revenue_ytd) period_to_date__revenue_ytdFROM (SELECT parseDateTimeBestEffort(arrayJoin( ['2024-01-01','2024-02-01','2024-03-01','2024-04-01','2024-05-01','2024-06-01','2024-07-01','2024-08-01','2024-09-01','2024-10-01','2024-11-01','2024-12-01','2025-01-01','2025-02-01','2025-03-01','2025-04-01','2025-05-01','2025-06-01','2025-07-01'] )) AS date_from, parseDateTimeBestEffort(arrayJoin( ['2024-01-31','2024-02-29','2024-03-31','2024-04-30','2024-05-31','2024-06-30','2024-07-31','2024-08-31','2024-09-30','2024-10-31','2024-11-30','2024-12-31','2025-01-31','2025-02-28','2025-03-31','2025-04-30','2025-05-31','2025-06-30','2025-07-31'] )) AS date_to) ASperiod_to_date.time_seriesINNER JOIN (SELECT toDateTime64(toStartOfMonth(toTimezone(toDateTime64(period_to_date_revenue_ytd_cumulative__period_to_date.time, 0 ) , 'UTC'), 'UTC'), 0, 'UTC') period_to_date__time_month, Sum(period_to_date_revenue_ytd_cumulative__period_to_date.revenue) period_to_date__revenue_ytdFROM default.test ASperiod_to_date_revenue_ytd_cumulative__period_to_dateWHERE ( toTimezone( toDateTime64(period_to_date_revenue_ytd_cumulative__period_to_date.time, 0), 'UTC') >= toDateTime64(toStartOfYear(parseDateTimeBestEffort ('2024-01-01'), 'UTC' ), 0, 'UTC') AND toTimezone(toDateTime64( period_to_date_revenue_ytd_cumulative__period_to_date.time, 0), 'UTC') <= parseDateTimeBestEffort('2025-07-31') ) GROUP BY period_to_date__time_month) AS period_to_date_revenue_ytd_cumulative__baseONperiod_to_date_revenue_ytd_cumulative__base.period_to_date__time_month >= toDateTime64(toStartOfYear(period_to_date.time_series.date_from, 'UTC'), 0, 'UTC' ) AND period_to_date_revenue_ytd_cumulative__base.period_to_date__time_month<=period_to_date.time_series.date_toGROUP BYperiod_to_date__time_month) AS q_0 ORDER BY period_to_date__time_monthASC LIMIT 10000
@igorlukanin it's starting to become a bit of a problem, at least for me :) . Do you have any workaround, is there a chance of this being fixed?
Hey @igorlukanin, still no solution yet ?
Hey @igorlukanin,
We are encountering a similar issue when using ClickHouse with rolling window functions combined with date range filters and date/time granularity applied. The query runs fine without applying granularity, but once granularity is added, we receive an error that appears to reflect PostgreSQL syntax being applied incorrectly:
Query failed: Error: Syntax error: failed at position 118 ('VALUES') (line 5, col 7): VALUES (...
Any suggestions or insights to resolve this would be greatly appreciated.