cube Using time dimension date filter in SQL

Using time dimension date filter in SQL

Open iabhitech opened this issue 4 months ago • 3 comments

Problem

How to use date filters in SQL queries that are generated by pre_aggregations.

Background I have a very complex SQL query, that runs daily, i.e. to run that SQL query we need a date filter to be passed.

SELECT '2024-01-01' as date, username, SUM(sales) as total_sales, more fields 
FROM sales 
-- COMPLEX JOINS BASED ON PASSED DATE.
WHERE sales.date = '2024-01-01'

As of now, I can do it using dynamic filters, but that has not worked with pre_aggregations, It replaces it with (1=1).

Expected Behaviour: I want to run pre_aggregations daily, which generate today's date pass to SQL query and store the result. Currently, If I remove the date filter from the SQL query, the query becomes too complex and it takes a very long time to execute.

Related Cube.js schema

cube(`Cube`, {
  sql: `SELECT '2024-01-01' as date, username, SUM(sales) as total_sales, more fields FROM sales WHERE {FILTER_PARAMS.user_sales.date.filter('sales.date')}`
 pre_aggregations: {
   type: original_sql,
   time_dimension: CUBE.date
   partition_granularity: day
 }
})

Mar 04 '24 07:03 iabhitech

Hi @iabhitech 👋

A number of points here:

Please use rollup pre-aggregations instead of original_sql
Please decompose your complex query in sql using cubes, joins between them, and subquery dimensions

If you show the query that you have there, I might try to suggest how to do that.

Mar 20 '24 21:03 igorlukanin

@iabhitech Did my advice help?

May 14 '24 10:05 igorlukanin

@igorlukanin Thanks for advice,

I'm currently using cubes in same manner, but decomposing into smaller cubes has proven difficult to manage and maintain. This is primarily due to the need for most columns in a single operation for pivot operations, causing lengthy cache refresh times.

My use case involves managing a highly normalized production database with tables such as customer, customer hierarchy, outlets, orders, rewards, and customer activities. I need to generate a customer overview report with over 40 columns including Date, Customer Id, Customer Name, Total Orders Taken, Total Visit Done, etc.

Currently, a cron job runs daily to store these fields in a flat table (with a unique key of customer_id, date), which is then served to a pivot frontend library.

I'm seeking advice on how to implement this using cubes, with the goal of running a daily query to store the data and serve results from cache only, eliminating the need for real-time refreshes.

May 14 '24 11:05 iabhitech

cube cube copied to clipboard

Using time dimension date filter in SQL

cube
cube copied to clipboard