Intermittently incorrect values when using calculated measures on pre-aggregated leaf measures
Describe the bug Querying the pre-aggregate using the REST API will intermittently generate incorrect results. Upon inspecting the logs, it appears that:
- The generated SQL statement is correct
- However, the cubestore query planner forms incorrect aliases, causing measures to be swapped around
This behavior is not consistent -- similar REST API queries that should return the same result will occasionally be correct.
To Reproduce Using the following REST API query:
{
"query": {
"measures": [
"sales.this_year_sales",
"sales.this_year_units",
"sales.yoy_sales_chg",
"sales.yoy_units_chg"
],
"dimensions": [
"sales.product_type"
],
"filters": [
{
"member": "sales.product_type",
"operator": "equals",
"values": ["PET_FOOD"]
},
{
"member": "sales.date",
"operator": "inDateRange",
"values": ["2025-01-18","2025-01-24"]
}
],
"limit": 5,
"order": {
"sales.this_year_sales": "desc"
},
"offset": 0
}
}
- Execute the query - observe that results may or may not be correct
- Modify the
limitparameter value, which should have no effect on the query results - Execute the modified query - observe that results may or may not be correct
- Repeat steps 2 and 3 as necessary until incorrect results are returned (personal observations yield a ~50% success rate)
In the screenshots section below, observe that the (implicitly added) last_year_sales and last_year_units values are swapped, causing the yoy_sales_chg and yoy_units_chg to return incorrect values.
For example, lets say that the values stored in the pre-aggregate are as follows:
| date_day | product_type | this_year_sales | last_year_sales | this_year_units | last_year_units |
|---|---|---|---|---|---|
| 2025-01-21 | PET_FOOD | 1000 | 900 | 10 | 9 |
In this example, the expected values are:
- yoy_sales_chg = 1000 - 900 = 100
- yoy_units_chg = 10 - 9 = 1
Instead, the following is returned:
- yoy_sales_chg = 1000 - 9 = 991
- yoy_units_chg = 10 - 900 = -890
Expected behavior Query results should be consistent and correct.
Screenshots The generated SQL query in the logs looks correct (formatted for readability):
SELECT
`sales__product_type` `sales__product_type`,
sum(`sales__this_year_sales`) `sales__this_year_sales`,
sum(`sales__this_year_units`) `sales__this_year_units`,
(sum(`sales__this_year_sales`) - sum(`sales__last_year_sales`)) `sales__yoy_sales_chg`,
(sum(`sales__this_year_units`) - sum(`sales__last_year_units`)) `sales__yoy_units_chg`
FROM cube_pre_aggregations.sales_sales_rollup_20250101_mwnaosch_js50fjs4_1jq1ihq AS `sales__sales_rollup`
WHERE (`sales__product_type` = ?)
AND (`sales__date_day` >= to_timestamp(?) AND `sales__date_day` <= to_timestamp(?))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5
However, the following log was observed (formatted for readability) which appears to internally swap the last_year_sales and last_year_units leaf measures. I suspect this is leading to incorrectly calculated yoy_sales_chg and yoy_units_chg values:
2025-01-31T19:28:29.313Z WARN [cubestore::queryplanner::query_executor] <pid:41 sel1> Slow Partition Query (262.100554ms):
Projection, [
sales__product_type,
sales__this_year_sales,
sales__this_year_units,
sales__yoy_sales_chg,
sales__yoy_units_chg
]
Projection, [
sales__sales_rollup.sales__product_type:sales__product_type,
sales__this_year_sales,
sales__this_year_units,
SUM(sales__sales_rollup.sales__last_year_sales):SUM(sales__sales_rollup.sales__last_year_units),
SUM(sales__sales_rollup.sales__last_year_units):SUM(sales__sales_rollup.sales__last_year_sales)
]
ClusterAggregateTopK, limit: 5
Filter
Scan sales__sales_rollup,
source: CubeTable(index: default:3:[5, 6, 41]),
fields: [
sales__product_type,
sales__date_day,
sales__last_year_sales,
sales__last_year_units,
sales__this_year_sales,
sales__this_year_units
]
Minimally reproducible Cube Schema
cubes:
- name: sales
sql_table: "{{ env_var('CUBEJS_DB_BQ_TABLE_NAME') | safe }}"
data_source: default
refresh_key:
every: 30 minutes
pre_aggregations:
- name: sales_rollup
external: true
dimensions:
- date
- product_type
measures:
- this_year_sales
- last_year_sales
- this_year_units
- last_year_units
time_dimension: date
granularity: day
partition_granularity: month
refresh_key:
every: 30 minutes
build_range_start:
sql: select timestamp('2025-01-17')
build_range_end:
sql: select timestamp('2025-01-25')
dimensions:
- name: date
sql: TIMESTAMP(date)
type: time
- name: product_type
sql: product_type
type: string
- name: product_name
sql: product_name
type: string
- name: product_id
sql: product_id
type: string
measures:
- name: this_year_sales
sql: this_year_sales
type: sum
- name: last_year_sales
sql: last_year_sales
type: sum
- name: this_year_units
sql: this_year_units
type: sum
- name: last_year_units
sql: last_year_units
type: sum
- name: yoy_sales_chg
sql: "{this_year_sales} - {last_year_sales}"
type: number
- name: yoy_units_chg
sql: "{this_year_units} - {last_year_units}"
type: number
Version: Behavior was observed using Docker images with tags between v1.1.8 and v1.3.20:
- cubejs/cube:v1.3.20
- cubejs/cubestore:v1.3.20
This is not reproducible using cubejs/cubestore:v1.1.7, and appears to have been introduced in v1.1.8.
Additional context
- Running in production mode (1 api, 1 refresh worker, 1 router, 2 workers)
- Raw data is stored Google BigQuery
- Pre-aggregations are stored in GCS using export bucket strategy
_edit: retested using latest (1.2.20) on 2025-03-10 and issue is still present _edit: retested using latest (1.3.8) on 2025-04-25 and issue is still present _edit: retested using latest (1.3.20) on 2025-06-09 and issue is still present
Thanks for reporting @Kelvin4702. Let us investigate and get back to you.
Any news on this ? We have the exact same issue and it's blocking
@Kelvin4702 @Rmannn Could you please try v1.3.7 (latest)?
I've tried to reproduce this on the latest and everything works as expected for me.
Data model that I've used (with DuckDB):
cubes:
- name: sales
sql: >
SELECT '2025-01-18' AS date, 'PET_FOOD' AS product_type, 'B1' AS product_name, 2 AS product_id, 200 AS this_year_sales, 100 AS last_year_sales, 2000 AS this_year_units, 1000 AS last_year_units UNION ALL
SELECT '2025-01-19' AS date, 'PET_FOOD' AS product_type, 'C1' AS product_name, 3 AS product_id, 300 AS this_year_sales, 150 AS last_year_sales, 3000 AS this_year_units, 1500 AS last_year_units UNION ALL
SELECT '2025-01-20' AS date, 'PET_FOOD' AS product_type, 'D1' AS product_name, 4 AS product_id, 400 AS this_year_sales, 200 AS last_year_sales, 4000 AS this_year_units, 2000 AS last_year_units
data_source: default
refresh_key:
every: 30 minutes
pre_aggregations:
- name: sales_rollup
external: true
dimensions:
- date
- product_type
measures:
- this_year_sales
- last_year_sales
- this_year_units
- last_year_units
time_dimension: date
granularity: day
partition_granularity: month
refresh_key:
every: 30 minutes
build_range_start:
sql: select '2025-01-17'::TIMESTAMP
build_range_end:
sql: select '2025-01-25'::TIMESTAMP
dimensions:
- name: date
sql: "date::TIMESTAMP"
type: time
- name: product_type
sql: product_type
type: string
- name: product_name
sql: product_name
type: string
- name: product_id
sql: product_id
type: string
measures:
- name: this_year_sales
sql: this_year_sales
type: sum
- name: last_year_sales
sql: last_year_sales
type: sum
- name: this_year_units
sql: this_year_units
type: sum
- name: last_year_units
sql: last_year_units
type: sum
- name: yoy_sales_chg
sql: "{this_year_sales} - {last_year_sales}"
type: number
- name: yoy_units_chg
sql: "{this_year_units} - {last_year_units}"
type: number
One of many query runs with different LIMIT:
Just tried on v1.3.8 -- issue still seems to be occurring.
Here's an excerpt from the cubestore worker logs, where we can see a query with "limit": 14 answer correctly, then another query with "limit": 15 answer incorrectly:
2025-04-25T15:34:32.030Z INFO [cubestored] <pid:96 sel3> Cube Store version 1.3.8
2025-04-25T15:34:32.361Z INFO [cubestore::cluster] <pid:3> Running select completed (376.196357ms)
2025-04-25T15:34:32.366Z WARN [cubestore::queryplanner::query_executor] <pid:3> Slow Query (377.587957ms):
Projection, [sales__product_type, sales__this_year_sales, sales__this_year_units, sales__yoy_sales_chg, sales__yoy_units_chg]
Projection, [sales__sales_rollup.sales__product_type:sales__product_type, sales__this_year_sales, sales__this_year_units, SUM(sales__sales_rollup.sales__last_year_sales), SUM(sales__sales_rollup.sales__last_year_units)]
ClusterAggregateTopK, limit: 14
Filter
Scan sales__sales_rollup, source: CubeTable(index: default:8:[8, 15]), fields: [sales__product_type, sales__date_day, sales__last_year_sales, sales__last_year_units, sales__this_year_sales, sales__this_year_units]
2025-04-25T15:34:42.392Z INFO [cubestored] <pid:125 sel4> Cube Store version 1.3.8
2025-04-25T15:34:42.588Z INFO [cubestore::cluster] <pid:3> Running select completed (233.05261ms)
2025-04-25T15:34:42.589Z WARN [cubestore::queryplanner::query_executor] <pid:3> Slow Query (233.877108ms):
Projection, [sales__product_type, sales__this_year_sales, sales__this_year_units, sales__yoy_sales_chg, sales__yoy_units_chg]
Projection, [sales__sales_rollup.sales__product_type:sales__product_type, sales__this_year_sales, sales__this_year_units, SUM(sales__sales_rollup.sales__last_year_sales):SUM(sales__sales_rollup.sales__last_year_units), SUM(sales__sales_rollup.sales__last_year_units):SUM(sales__sales_rollup.sales__last_year_sales)]
ClusterAggregateTopK, limit: 15
Filter
Scan sales__sales_rollup, source: CubeTable(index: default:8:[8, 15]), fields: [sales__product_type, sales__date_day, sales__last_year_sales, sales__last_year_units, sales__this_year_sales, sales__this_year_units]
We are also experiencing this same issue on v1.3.8. We have downgraded to 1.1.7 in the meantime and can confirm that this error does not occur in this version.
Same,
- trying 1.3.8 to 1.3.11, issue is still present.
Adding an orderBy to the cube request seems to fix the issue.
Adding order by also helped me, as a workaround I used query rewrite to add order to the query when it doens't exist, so the user will not need to remember that
Unfortunately, adding an order parameter to the REST API request did not remove the issue for the example in the original post :(
Tested using the latest version (1.3.20).
@Kelvin4702 It worked for us on version 1.3.15 Alternatively you can try to add to your payload all the meassures that constract other measures.
In your case, add:
- sales.last_year_sales
- sales.last_year_units
@igorlukanin To emphasize- this issue occurs when using measures that are composed out of other measures. Since itβs a silent bug, customers may not be aware it affects them. Consider adding warning tooltip next to this part of the documentation with the afected versions.
Moving to non-partitioned pre-aggregation solved the issue for us, but not always possible
Finally in our case I think it was because our API and the Refresh Worker were concurrently trying to build our cube partitioned pre-aggregations, resulting in errors and partially built pre-aggs, causing the shift in the columns.