cube icon indicating copy to clipboard operation
cube copied to clipboard

Support for Partition Granularity and Incremental Data Loading in original_sql

Open alokthakur142 opened this issue 2 years ago • 3 comments

Problem: When working with a large PostgreSQL database and utilizing original_sql for its query complexity advantages, I've encountered a significant bottleneck in the preaggregation process. Creating preaggregations using original_sql takes approximately a minute, causing challenges for our team due to the size of our database.

Recently, I discovered the benefits of partition granularity and incremental data loading in preaggregation. Unfortunately, these features are not currently supported in original_sql.

Possible Solution: I propose implementing support for partition granularity and incremental data loading in original_sql. This enhancement would enable me to optimize data refresh strategies, such as refreshing recent data (e.g., this month's data) every 10 minutes and refreshing other data on a weekly basis. This approach promises a significant improvement in data preparation efficiency.

Additional Context: In our scenario, we manage a large activity_log table that records user actions. Reports generated from this log, such as user activity summaries over a specific duration, team-based activity, and active users within a timeframe, are crucial for decision-making. Implementing partition granularity and incremental data loading would allow us to provide near-real-time reports, empowering users to make informed decisions about their team's activities.

alokthakur142 avatar Nov 29 '23 15:11 alokthakur142

Hi @alokthakur142 👋 Thanks for a very elaborate feature request.

Is there anything that prevents you from using rollup pre-aggregations instead of original_sql ones?

Using rollup pre-aggregations would give you all the benefits you'd like to get, e.g., "partition granularity and incremental data loading."

igorlukanin avatar Nov 30 '23 12:11 igorlukanin

rollup has a limitation on using joins. I want to give user flexibility to join it with multiple data sources which is not possible with rollup join. Example, they can join it with customer, user, reporting person of the user, custom fields for customer or user etc.

alokthakur142 avatar Nov 30 '23 12:11 alokthakur142

@igorlukanin - Any documentation to help me develop the solution?

alokthakur142 avatar Jan 25 '24 11:01 alokthakur142

@alokthakur142 Sure! Please see https://cube.dev/docs/reference/data-model/pre-aggregations#rollup and also https://cube.dev/docs/reference/data-model/pre-aggregations#rollup_join

igorlukanin avatar Feb 20 '24 12:02 igorlukanin

@igorlukanin - Do you have any solution available for this for this already?

alokthakur142 avatar May 19 '24 06:05 alokthakur142