Filter pushdown issue with aggregating inline views and CTE's
Filters are applied to the outer SELECT. This cannot work with aggregating inline views and CTE's (common table expressions if dimensionality is reduced. With FILTER_PARAMS we can push down the filter inside the inline view or CTS like this:
Having this simple example with a transaction table:
CREATE TABLE food_sales (
CREATED_TS timestamp PRIMARY KEY,
ITEM string,
AMOUNT integer
);
and a Cube defined for aggregating top-5 with all others (typical pie chart) we want to filter for AMOUNT < n (eg. finding top-5 items in a lower segment):
cube(`FoodSales_Aggr_Top5`, {
sql: `WITH top5 AS (
SELECT ITEM, sum(AMOUNT) AS AMOUNT_SUM
FROM food_sales
WHERE ${FILTER_PARAMS.FoodSales_Aggr_Top5.amount.filter('AMOUNT')}
GROUP BY ITEM
ORDER BY AMOUNT_SUM DESC
LIMIT 5
)
SELECT * FROM top5
UNION ALL
SELECT 'Others' as ITEM, sum(AMOUNT) AS AMOUNT_SUM
FROM food_sales
WHERE ${FILTER_PARAMS.FoodSales_Aggr_Top5.amount.filter('AMOUNT')}
AND ITEM NOT IN (SELECT ITEM FROM top5)`,
dataSource: `default`,
title: `Food Sales Top 5`,
description: `Food Sales aggregated Top5 sales`,
dimensions: {
created: {
sql: 'CREATED_TS',
type: 'time',
primaryKey: true,
shown: false
},
item: {
sql: `ITEM`,
type: `string`,
},
amount: {
sql: `AMOUNT`,
type: `number`,
shown: false
}
},
measures: {
sumAmount: {
sql: `AMOUNT_SUM`,
type: `sum`
}
}
});
Now we define this query with the filter AMOUNT < 1000:
{
measures: [
'FoodSales_Aggr_Top5.sumAmount'
],
dimensions: [
'FoodSales_Aggr_Top5.item'
],
filters: [{
member: 'FoodSales_Aggr_Top5.amount',
operator: 'lt',
values: ['1000']
}]
}
This is the generated SQL:
SELECT "food_sales__aggr__top5".ITEM "food_sales__aggr__top5__item",
sum("food_sales__aggr__top5".AMOUNT_SUM) "food_sales__aggr__top5__sum_amount"
FROM (
WITH top5 AS (
SELECT ITEM,
sum(AMOUNT) AS AMOUNT_SUM
FROM food_sales
WHERE AMOUNT < ?
GROUP BY ITEM
ORDER BY AMOUNT_SUM DESC
LIMIT 5)
SELECT * FROM top5
UNION ALL
SELECT 'Others' as ITEM,
sum(AMOUNT) AS AMOUNT_SUM
FROM food_sales
WHERE AMOUNT < ? AND ITEM NOT IN (SELECT ITEM FROM top5)
) AS "food_sales__aggr__top5"
WHERE ("food_sales__aggr__top5".AMOUNT < ?)
GROUP BY 1
ORDER BY 2 DESC
As we see we have an additional WHERE clause on the outer SELECT where we do not have the AMOUNT field because of aggregation.
My expectation is that this is not needed here because we have explicitly used FILTER_PARAMS for this field inside the SQL. Also, I would expect the field AMOUNT would not appear in the outer SELECT since I set shown: false in the Cube definition.
Maybe there is already a solution to such kind of queries and I haven't found it yet?
@ralfbecher Hey Ralf! That's a great question! As of now the best way to approach it is with multiple queries: For top items
{
measures: [
'FoodSales.sumAmount'
],
dimensions: [
'FoodSales.item'
],
order: {
'FoodSales.sumAmount': 'desc'
},
limit: 5
}
And then for others
{
measures: [
'FoodSales.sumAmount'
],
filters: [{
dimensions: 'FoodSales.item',
operator: 'notEquals',
values: [...topItems]
}]
}
It isn't obvious but those 2 separate queries usually faster than a big one and also easier to maintain. We're considering to make these "other" queries as part of an API.
Let's use this issue to track this API enhancement.
@paveltiunov Hi Pavel, thanks for your suggestion. I also had in mind to decompose the SQL but then we run into a pipeline approach with longer roundtrip times.
However, my intention here was not to solve the top-5 "other" problem (it was just one example), and I would appreciate any solution in Cube.js for that, but to address the filter pushdown (predicate pushdown) inside the inner SELECT statement(s) or CTEs especially for aggregation queries.
This means to have a control where a WHERE clause is added. And I think if we explicitly use FILTER_PARAMS these filters should not be added as WHERE clause to the enclosing outer SELECT statement, or we should be able to control this by a filter or dimension property.
@ralfbecher Could you please provide exact SQL query you need to reproduce then?
@paveltiunov Let's have an inline view with aggregation where I want to filter on a criteria which is not part of the dimensionality (where amount > 1000, or where country = 'UK'):
SELECT
item,
week,
amount_sum - lag(amount_sum) over(PARTITION BY item ORDER BY week) as amount_diff
FROM (
SELECT
item,
WEEK(created_ts) as week,
sum(amount) as amount_sum
FROM food_sales
WHERE amount > 1000
GROUP BY
item,
week
)
Seconding this request - there should be an ability to pass filters into the query that are not also added to the WHERE clause.
I encountered a similar issue where a double filtering with LEFT JOIN caused the loss of main table data. Are there any ways to avoid this?
Any way to pass FILTER_PARAMS into a query, but also not creating the outer WHERE clause?
I've been trying to use FILTER_PARAMS inside a query, only to have an unnecessary WHERE clause on the outer query built by Cube.
Is there a way we can disable the building of the outer WHERE clause derived from the FILTER_PARAMS?
Any way to pass FILTER_PARAMS into a query, but also not creating the outer WHERE clause?
No, this is not supported.
@gmartpad Could you please elaborate on your use case? What are you trying to model/calculate?
(I think it's fine to leave a comment on the same topic just once. Thank you!)