cube icon indicating copy to clipboard operation
cube copied to clipboard

Rollup joins across different data sources with different param allocators causing invalidations

Open pauldheinrichs opened this issue 1 month ago • 0 comments

Describe the bug When attempting to run the orchestration on a druid datasource to rebuild pre-aggregations. The following error is spit out in the API logs after running.

--- EDIT ---

Seems like this for me may be an underlying issue with something i'm working through. My API node is frequently assuming my druid models are my default driver param object in more than just orchestration. It infact is invalidating my pre-aggregations because it's assuming the default driver. I do not have a cube.js file in my instance.

Slack thread:https://cube-js.slack.com/archives/C04NYBJP7RQ/p1715737464066809


(IF i'm just doing it wrong... let me know 😓 )

{
  "action": "post",
      "selector": {
      "contexts": [{ "securityContext": {} }],
      "timezones": ["UTC"],
        "dataSources": ["druid"]
    }
}
{"message":"Downloading external pre-aggregation via query error","queryKey":
[["CREATE TABLE cube_production_pre_aggregations.video_views_video_rollup_count20240415 AS SELECT\n      \"video_views\".video_id \"video_views__video_id\", DATE_TRUNC('day', CAST(TIME_FORMAT(\"video_views\".__time, 'yyyy-MM-dd HH:mm:ss', 'UTC') AS TIMESTAMP)) \"video_views__time_day\", sum(\"video_views\".views) \"video_views__view_count\"\n    FROM\n      TeamVideoView AS \"video_views\"  WHERE (\"video_views\".__time >= TIME_PARSE($1) AND \"video_views\".__time <= TIME_PARSE($2)) GROUP BY 1, 2",["2024-04-15T00:00:00.000Z","2024-04-21T23:59:59.999Z"],{}],[[{"refresh_key":null}]]],"query":"SELECT\n      \"video_views\".video_id \"video_views__video_id\", DATE_TRUNC('day', CAST(TIME_FORMAT(\"video_views\".__time, 'yyyy-MM-dd HH:mm:ss', 'UTC') AS TIMESTAMP)) \"video_views__time_day\", sum(\"video_views\".views) \"video_views__view_count\"\n    FROM\n      TeamVideoView AS \"video_views\"  WHERE (\"video_views\".__time >= TIME_PARSE($1) AND \"video_views\".__time <= TIME_PARSE($2)) GROUP BY 1, 2","values":["2024-04-15T00:00:00.000Z","2024-04-21T23:59:59.999Z"],"targetTableName":"cube_production_pre_aggregations.video_views_video_rollup_count20240415_lnicrtcl_403ejwb4_1j4a3mu","requestId":"6e558a4f-3de1-4e19-a350-fb41acb687a6","newVersionEntry":{"table_name":"cube_production_pre_aggregations.video_views_video_rollup_count20240415","structure_version":"403ejwb4","content_version":"lnicrtcl","last_updated_at":1715801822362,"naming_version":2},"buildRangeEnd":"2024-04-21T23:59:59.999","error":"Error: org.apache.calcite.runtime.CalciteContextException: From line 4, column 81 to line 4, column 82: Column '$1' not found in any table.__time >= TIME_PARSE($1) AND \"video_views\".__time <= TIME_PARSE($2)) GROUP BY 1, 2","values":["2024-04-15T00:00:00.000Z","2024-04-21T23:59:59.999Z"],"targetTableName":"cube_production_pre_aggregations.video_views_video_rollup_count20240415_lnicrtcl_403ejwb4_1j4a3mu","requestId":"6e558a4f-3de1-4e19-a350-fb41acb687a6","newVersionEntry":{"table_name":"cube_production_pre_aggregations.video_views_video_rollup_count20240415","structure_version":"403ejwb4","content_version":"lnicrtcl","last_updated_at":1715801822362,"naming_version":2},"buildRangeEnd":"2024-04-21T23:59:59.999","error":"Error: org.apache.calcite.runtime.CalciteContextException: From line 4, column 81 to line 4, column 82: Column '$1' not found in any table

Focusing on specifically Error: org.apache.calcite.runtime.CalciteContextException: From line 4, column 81 to line 4, column 82: Column '$1' not found in any table

at /cube/node_modules/@cubejs-backend/druid-driver/src/DruidClient.ts:92:19\n    
at processTicksAndRejections (node:internal/process/task_queues:95:5)\n    
at DruidDriver.downloadQueryResults (/cube/node_modules/@cubejs-backend/druid-driver/src/DruidDriver.ts:140:31)\n    
at PreAggregationLoader.refreshReadOnlyExternalStrategy (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:1154:19)\n    
at Object.query (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryCache.ts:602:26)\n    
at QueryQueue.processQuery (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:842:25)"}

To Reproduce Steps to reproduce the behavior:

  1. setup druid datasource and another default (postgres / redshift in my case)
CUBEJS_DATASOURCES: "default,druid"
  1. setup pre-aggregation on druid cube with a refresh key as such
     refreshKey: {
        every: `1 hour`,
        incremental: true
      },

  1. hit the orchestration API with the above API call

Expected behavior It seems that this process isn't utilizing the druid based ? expected variable but rather the postgres driver param in my case (which is redshift / postgres $1 syntax )

IT would appear that this method is being called with a passed in $1 args

image

Version: Latest version

pauldheinrichs avatar May 15 '24 20:05 pauldheinrichs