cube icon indicating copy to clipboard operation
cube copied to clipboard

Rolling windows for clickhouse

Open mjvankampen opened this issue 9 months ago • 8 comments

Is your feature request related to a problem? Please describe. I want to use rolling windows in clickhouse but I cannot get them to work when applying a granularity. Looking at the tests running for clickhouse I guess this is not supported?

Describe the solution you'd like Rolling windows work with granularity applied.

Describe alternatives you've considered At least document that it is not supported so people don't lose time. Perhaps include an alternative.

Additional context Ref. https://github.com/cube-js/cube/blob/eab3ce906e1b0af1551057b567d01d96b635ed01/packages/cubejs-schema-compiler/test/integration/clickhouse/custom-granularities.test.ts#L329

I think this support is now there, ref. https://clickhouse.com/docs/sql-reference/statements/select/join#join-with-inequality-conditions-for-columns-from-different-tables

For this you need join_use_nulls=0, which cube does when clickhouse is readonly

mjvankampen avatar May 21 '25 10:05 mjvankampen

@mjvankampen Does it not work for you only with any granularity or only with custom granularities? It would be great to see the data model + query that failed for you.

igorlukanin avatar Jun 19 '25 18:06 igorlukanin

@igorlukanin For example using the example from the docs with the query used there (I did add a date range) I get the following error using a standard granularity:

Image

using value does work, but typically not very useful.

Image
cubes:
  - name: orders
    sql: >
      SELECT 1 AS id, '2025-01-01'::TIMESTAMP AS time UNION ALL
      SELECT 2 AS id, '2025-01-11'::TIMESTAMP AS time UNION ALL
      SELECT 3 AS id, '2025-01-21'::TIMESTAMP AS time UNION ALL
      SELECT 4 AS id, '2025-01-31'::TIMESTAMP AS time UNION ALL
      SELECT 5 AS id, '2025-02-01'::TIMESTAMP AS time UNION ALL
      SELECT 6 AS id, '2025-02-11'::TIMESTAMP AS time UNION ALL
      SELECT 7 AS id, '2025-02-21'::TIMESTAMP AS time UNION ALL
      SELECT 8 AS id, '2025-03-01'::TIMESTAMP AS time UNION ALL
      SELECT 9 AS id, '2025-03-11'::TIMESTAMP AS time UNION ALL
      SELECT 10 AS id, '2025-03-21'::TIMESTAMP AS time UNION ALL
      SELECT 11 AS id, '2025-03-31'::TIMESTAMP AS time UNION ALL
      SELECT 12 AS id, '2025-04-01'::TIMESTAMP AS time
 
    dimensions:
      - name: time
        sql: time
        type: time
 
    measures:
      - name: rolling_count_month
        sql: id
        type: count
        rolling_window:
          trailing: unbounded

mjvankampen avatar Jun 22 '25 19:06 mjvankampen

I confirm it's a specific issue between clickhouse and cube. I created a CH table :

Image

And I have the following error when reproducing the test :

Image

And the same definition as in the example :

Image

I even tried to add the allow_experimental_join_condition in the cube sql definition, without success :

Image

Here is the sql generated by cube : SELECT q_0.period_to_date__time_month, period_to_date__revenue_ytd period_to_date__revenue_ytdFROM ( SELECTperiod_to_date.time_series.date_from period_to_date__time_month, sum(period_to_date__revenue_ytd) period_to_date__revenue_ytdFROM ( SELECT parseDateTimeBestEffort( arrayJoin( ['2024-07-01T00:00:00.000','2024-08-01T00:00:00.000','2024-09-01T00:00:00.000','2024-10-01T00:00:00.000','2024-11-01T00:00:00.000','2024-12-01T00:00:00.000','2025-01-01T00:00:00.000','2025-02-01T00:00:00.000','2025-03-01T00:00:00.000','2025-04-01T00:00:00.000','2025-05-01T00:00:00.000','2025-06-01T00:00:00.000','2025-07-01T00:00:00.000'] ) ) as date_from, parseDateTimeBestEffort( arrayJoin( ['2024-07-31T23:59:59.999','2024-08-31T23:59:59.999','2024-09-30T23:59:59.999','2024-10-31T23:59:59.999','2024-11-30T23:59:59.999','2024-12-31T23:59:59.999','2025-01-31T23:59:59.999','2025-02-28T23:59:59.999','2025-03-31T23:59:59.999','2025-04-30T23:59:59.999','2025-05-31T23:59:59.999','2025-06-30T23:59:59.999','2025-07-31T23:59:59.999'] ) ) as date_to ) ASperiod_to_date.time_seriesLEFT JOIN ( SELECT toDateTime64( toStartOfMonth( toTimeZone( toDateTime64(period_to_date_revenue_ytd_cumulative__period_to_date.time, 0 ), 'UTC' ), 'UTC' ), 0, 'UTC' ) period_to_date__time_month, sum( period_to_date_revenue_ytd_cumulative__period_to_date.revenue ) period_to_date__revenue_ytdFROM ( SELECT * from t12.test SETTINGS allow_experimental_join_condition = 1 ) ASperiod_to_date_revenue_ytd_cumulative__period_to_dateWHERE ( toTimeZone( toDateTime64(period_to_date_revenue_ytd_cumulative__period_to_date.time, 0 ), 'UTC' ) >= toDateTime64( toStartOfYear(parseDateTimeBestEffort(?), 'UTC'), 0, 'UTC' ) AND toTimeZone( toDateTime64( period_to_date_revenue_ytd_cumulative__period_to_date.time, 0 ), 'UTC' ) <= parseDateTimeBestEffort(?) ) GROUP BY period_to_date__time_month) ASperiod_to_date_revenue_ytd_cumulative__baseONperiod_to_date_revenue_ytd_cumulative__base.period_to_date__time_month >= toDateTime64( toStartOfYear(period_to_date.time_series.date_from, 'UTC'), 0, 'UTC' ) AND period_to_date_revenue_ytd_cumulative__base.period_to_date__time_month<=period_to_date.time_series.date_toGROUP BYperiod_to_date__time_month) as q_0 ORDER BYperiod_to_date__time_month ASC LIMIT 10000

SaisonYann avatar Jul 01 '25 10:07 SaisonYann

We tried to use a more recent version of clickhouse to have experimental_join accepted by default : ClickHouse server version 25.5.2.47 (official build).

We have now the same issue as mjvankampen

Image

SaisonYann avatar Jul 03 '25 06:07 SaisonYann

After rewriting the query sent by cube to clickhouse we found that it may be caused by the left join (https://github.com/ClickHouse/ClickHouse/issues/76317). Once we replace it with an inner join it works (but the result can be obviously false). Here is the original query : SELECT q_0.period_to_date__time_month, period_to_date__revenue_ytd period_to_date__revenue_ytdFROM (SELECTperiod_to_date.time_series.date_from period_to_date__time_month, Sum(period_to_date__revenue_ytd) period_to_date__revenue_ytdFROM (SELECT parseDateTimeBestEffort(arrayJoin( ['2024-01-01t00:00:00.000','2024-02-01t00:00:00.000','2024-03-01t00:00:00.000','2024-04-01t00:00:00.000','2024-05-01t00:00:00.000','2024-06-01t00:00:00.000','2024-07-01t00:00:00.000','2024-08-01t00:00:00.000','2024-09-01t00:00:00.000','2024-10-01t00:00:00.000','2024-11-01t00:00:00.000','2024-12-01t00:00:00.000','2025-01-01t00:00:00.000','2025-02-01t00:00:00.000','2025-03-01t00:00:00.000','2025-04-01t00:00:00.000','2025-05-01t00:00:00.000','2025-06-01t00:00:00.000','2025-07-01t00:00:00.000'] )) AS date_from, parseDateTimeBestEffort(arrayJoin( ['2024-01-31t23:59:59.999','2024-02-29t23:59:59.999','2024-03-31t23:59:59.999','2024-04-30t23:59:59.999','2024-05-31t23:59:59.999','2024-06-30t23:59:59.999','2024-07-31t23:59:59.999','2024-08-31t23:59:59.999','2024-09-30t23:59:59.999','2024-10-31t23:59:59.999','2024-11-30t23:59:59.999','2024-12-31t23:59:59.999','2025-01-31t23:59:59.999','2025-02-28t23:59:59.999','2025-03-31t23:59:59.999','2025-04-30t23:59:59.999','2025-05-31t23:59:59.999','2025-06-30t23:59:59.999','2025-07-31t23:59:59.999'] )) AS date_to) ASperiod_to_date.time_seriesLEFT JOIN (SELECT toDateTime64(toStartOfMonth(toTimezone(toDateTime64(period_to_date_revenue_ytd_cumulative__period_to_date.time, 0 ) , 'UTC'), 'UTC'), 0, 'UTC') period_to_date__time_month, Sum(period_to_date_revenue_ytd_cumulative__period_to_date.revenue) period_to_date__revenue_ytdFROM default.test ASperiod_to_date_revenue_ytd_cumulative__period_to_dateWHERE ( toTimezone( toDateTime64(period_to_date_revenue_ytd_cumulative__period_to_date.time, 0), 'UTC') >= toDateTime64(toStartOfYear(parseDateTimeBestEffort ('2024-01-01T00:00:00.000'), 'UTC' ), 0, 'UTC') AND toTimezone(toDateTime64( period_to_date_revenue_ytd_cumulative__period_to_date.time, 0), 'UTC') <= parseDateTimeBestEffort('2025-07-31T23:59:59.999') ) GROUP BY period_to_date__time_month) AS period_to_date_revenue_ytd_cumulative__baseONperiod_to_date_revenue_ytd_cumulative__base.period_to_date__time_month >= toDateTime64(toStartOfYear(period_to_date.time_series.date_from, 'UTC'), 0, 'UTC' ) AND period_to_date_revenue_ytd_cumulative__base.period_to_date__time_month<=period_to_date.time_series.date_toGROUP BYperiod_to_date__time_month) AS q_0 ORDER BY period_to_date__time_monthASC LIMIT 10000

And the updated query : SELECT q_0.period_to_date__time_month, period_to_date__revenue_ytd period_to_date__revenue_ytdFROM (SELECTperiod_to_date.time_series.date_from period_to_date__time_month, Sum(period_to_date__revenue_ytd) period_to_date__revenue_ytdFROM (SELECT parseDateTimeBestEffort(arrayJoin( ['2024-01-01','2024-02-01','2024-03-01','2024-04-01','2024-05-01','2024-06-01','2024-07-01','2024-08-01','2024-09-01','2024-10-01','2024-11-01','2024-12-01','2025-01-01','2025-02-01','2025-03-01','2025-04-01','2025-05-01','2025-06-01','2025-07-01'] )) AS date_from, parseDateTimeBestEffort(arrayJoin( ['2024-01-31','2024-02-29','2024-03-31','2024-04-30','2024-05-31','2024-06-30','2024-07-31','2024-08-31','2024-09-30','2024-10-31','2024-11-30','2024-12-31','2025-01-31','2025-02-28','2025-03-31','2025-04-30','2025-05-31','2025-06-30','2025-07-31'] )) AS date_to) ASperiod_to_date.time_seriesINNER JOIN (SELECT toDateTime64(toStartOfMonth(toTimezone(toDateTime64(period_to_date_revenue_ytd_cumulative__period_to_date.time, 0 ) , 'UTC'), 'UTC'), 0, 'UTC') period_to_date__time_month, Sum(period_to_date_revenue_ytd_cumulative__period_to_date.revenue) period_to_date__revenue_ytdFROM default.test ASperiod_to_date_revenue_ytd_cumulative__period_to_dateWHERE ( toTimezone( toDateTime64(period_to_date_revenue_ytd_cumulative__period_to_date.time, 0), 'UTC') >= toDateTime64(toStartOfYear(parseDateTimeBestEffort ('2024-01-01'), 'UTC' ), 0, 'UTC') AND toTimezone(toDateTime64( period_to_date_revenue_ytd_cumulative__period_to_date.time, 0), 'UTC') <= parseDateTimeBestEffort('2025-07-31') ) GROUP BY period_to_date__time_month) AS period_to_date_revenue_ytd_cumulative__baseONperiod_to_date_revenue_ytd_cumulative__base.period_to_date__time_month >= toDateTime64(toStartOfYear(period_to_date.time_series.date_from, 'UTC'), 0, 'UTC' ) AND period_to_date_revenue_ytd_cumulative__base.period_to_date__time_month<=period_to_date.time_series.date_toGROUP BYperiod_to_date__time_month) AS q_0 ORDER BY period_to_date__time_monthASC LIMIT 10000

SaisonYann avatar Jul 03 '25 07:07 SaisonYann

@igorlukanin it's starting to become a bit of a problem, at least for me :) . Do you have any workaround, is there a chance of this being fixed?

mjvankampen avatar Aug 21 '25 10:08 mjvankampen

Hey @igorlukanin, still no solution yet ?

SaisonYann avatar Oct 08 '25 13:10 SaisonYann

Hey @igorlukanin,

We are encountering a similar issue when using ClickHouse with rolling window functions combined with date range filters and date/time granularity applied. The query runs fine without applying granularity, but once granularity is added, we receive an error that appears to reflect PostgreSQL syntax being applied incorrectly:

Query failed: Error: Syntax error: failed at position 118 ('VALUES') (line 5, col 7): VALUES (...

Any suggestions or insights to resolve this would be greatly appreciated.

orders_rolling.yml

tarunparswani26 avatar Oct 30 '25 13:10 tarunparswani26