cube
cube copied to clipboard
BUG pre-aggregation not selected on many-to-many relationship + measure
Problem I am running into a problem with pre-aggregations I have three tables from postgresql :
- client, client_agency, agency
- client and agency have a many to many relationship through client_agency
- the cubes have no measures, only dimensions
I have one table from bigQuery :
- report
- report has client id, and a many to one relationship with client
- report has many measures (all of type SUM)
- a pre aggregation (type : rollup_join, external : true) has been defined containing dimensions from client, client_agency, agency and report, as well as the report measures
DATA MODEL:
cube("client", {
sql: `SELECT * FROM "s_db".clients`,
dataSource: "pg",
dimensions: {
id: {
sql: "id",
type: "number",
primaryKey: true,
},
name: {
sql: "name",
type: "string",
},
updated_at: {
sql: "updated_at",
type: "time",
},
},
joins: {
client_agency: {
sql: `${CUBE.id} = ${client_agency.client_id}`,
relationship: `many_to_one`,
},
},
pre_aggregations: {
main: {
type: `rollup`,
external: true,
dimensions: [CUBE.id, CUBE.name, CUBE.updated_at],
refresh_key: {
every: `1 day`,
sql: `SELECT MAX(updated_at) FROM "s_db".clients`,
},
indexes: {
client_id: {
columns: [CUBE.id],
},
},
},
},
});
cube("agency", {
sql: `SELECT * FROM "s_db".agencies`,
dataSource: "pg",
dimensions: {
id: {
sql: "id",
type: "number",
primaryKey: true,
},
name: {
sql: "name",
type: "string",
},
updated_at: {
sql: "updated_at",
type: "time",
},
},
joins: {
client_agency: {
sql: `${CUBE.id} = ${client_agency.agency_id}`,
relationship: `many_to_one`,
},
},
pre_aggregations: {
main: {
type: `rollup`,
external: true,
dimensions: [CUBE.id, CUBE.name],
refresh_key: {
every: `1 day`,
sql: `SELECT MAX(updated_at) FROM "s_db".agencies`,
},
indexes: {
agency_id: {
columns: [CUBE.id],
},
},
},
},
});
cube("client_agency", {
sql: `SELECT * FROM "s_db".client_agency`,
dataSource: "pg",
dimensions: {
agency_id: {
sql: "agency_id",
type: "number",
primaryKey: true,
},
client_id: {
sql: "client_id",
type: "number",
primaryKey: true,
},
},
joins: {
agency: {
sql: `${CUBE.agency_id} = ${agency.id}`,
relationship: `one_to_many`,
},
client: {
sql: `${CUBE.client_id} = ${client.id}`,
relationship: `one_to_many`,
},
},
pre_aggregations: {
main: {
type: `rollup`,
external: true,
dimensions: [
CUBE.agency_id,
CUBE.client_id
],
measures: [CUBE.count],
refresh_key: {
every: `1 day`,
},
indexes: {
agency_id: {
columns: [CUBE.agency_id],
},
client_id: {
columns: [CUBE.client_id],
},
},
},
},
});
cube("report", {
title: "Report",
name: "report",
sql: "SELECT * FROM reports.report",
dataSource: "bq",
dimensions: {
date: {
sql: "CAST(Date as TIMESTAMP)",
type: "time",
primaryKey: false,
shown: true,
},
account_id: {
sql: "account_id",
type: "string",
primaryKey: true,
shown: true,
},
client_id: {
sql: "client_id",
type: "string",
primaryKey: true,
shown: true,
},
updated_at: {
sql: "updated_at",
type: "time",
primaryKey: false,
shown: true,
},
},
measures: {
impressions: {
sql: "impressions",
type: "sum",
},
clicks: {
sql: "clicks",
type: "sum",
},
},
joins: {
client: {
sql: `${CUBE.client_id} = ${client.id}`,
relationship: "many_to_one",
},
},
pre_aggregations: {
main: {
type: "rollup",
external: true,
dimensions: [CUBE.date, CUBE.account_id, CUBE.client_id, CUBE.updated_at],
measures: [CUBE.impressions, CUBE.clicks],
time_dimension: CUBE.date,
granularity: "day",
partition_granularity: "month",
refresh_key: {
every: "1 minute",
sql: `SELECT MAX(updated_at) FROM (SELECT * FROM reports.report) AS report WHERE ${FILTER_PARAMS.report.date.filter(
"CAST(Date AS TIMESTAMP)"
)}`,
},
indexes: {
client_id: {
columns: [CUBE.client_id],
},
},
},
report_client_client_agency_agency:
{
type: `rollup_join`,
external: true,
time_dimension: CUBE.date,
granularity: `day`,
dimensions: [
CUBE.date,
CUBE.account_id,
CUBE.client_id,
CUBE.updated_at,
client.id,
client.name,
client.updated_at,
client_agency.agency_id,
client_agency.client_id,
agency.id,
agency.name,
agency.updated_at,
],
measures: [
CUBE.impressions,
CUBE.clicks
],
rollups: [
CUBE.main,
client.main,
client_agency.main,
agency.main,
],
},
},
rewriteQueries: false,
});
The problem:
- If I make a query with dimensions from all 4 tables (no measures), the pre aggreagation is selected (the result is correct)
- If I make a query with one measure from report and dimensions from client, the pre aggregation is selected (the result is correct)
- If I make a query with one measure from report and dimensions from client AND AGENCY, the pre aggregation is NOT selected, I get the following error:
To join across data sources use rollupJoin with Cube Store. If rollupJoin is defined, this error indicates it doesn't match the query. Please use Rollup Designer to verify it's definition. Found data sources: bq, pg
Problematic query:
{
"order": {
"report.date": "asc"
},
"dimensions": [
"report.account_id",
"agency.name",
"client.name"
],
"measures": [
"report.clicks"
]
}
But if I use Rollup Designer it tells me to add the metric to the pre-aggregation, just like I already have.
Hi @kriKinesso 👋
My suspicion is that referencing multiple (too many?) pre-aggregations somehow breaks the query matching resolution:
rollups: [
CUBE.main,
client.main,
client_agency.main,
agency.main,
],
Could you please try reworking your pre-aggregations in a way that there are only two pre-aggregations there, one from bq and one from pg?
Hi @igorlukanin 👋, thank you for your answer.
So is there a limit number for the pre aggregations you can reference in a rollup join?
In my case I have to use pre-aggregations for this sort of queries, I absolutely can not allow users to query the datasource directly. If I remove any of the pre-aggregations from rollups the query won't match.
Also, I have other pre-aggregations, with a larger number of rollups that work well. This error comes up only in the case of many to many relationships, which leads me to think that either we are dealing with a bug, or pre-aggregations are not meant to be used with many to many relationships.
In my case I have to use pre-aggregations for this sort of queries, I absolutely can not allow users to query the datasource directly. If I remove any of the pre-aggregations from rollups the query won't match.
Well, I don't think that you should not use pre-aggregations. However, I feel like you might try combining 3 pre-aggregations on the pg data source (that you reference in the rollup join) into one. And then reference only that one in the rollup join.
This error comes up only in the case of many to many relationships, which leads me to think that either we are dealing with a bug, or pre-aggregations are not meant to be used with many to many relationships.
I can't spot the issue by looking at your data model. Let me tag @paveltiunov in case he sees anything suspicions.
I tried grouping the pre-aggregations, and I still get the same error, so unfortunately that is not the issue.