cube icon indicating copy to clipboard operation
cube copied to clipboard

Trigger rebuild of specific partitions of partitioned pre-aggregations using the Orchestration API

Open siamak-haschemi opened this issue 1 year ago • 3 comments

Is your feature request related to a problem? Please describe.

We want to trigger rebuild of specific partitions of partitioned pre-aggregations using the Orchestrator API.

Describe the solution you'd like

Extend the cubejs-api/v1/pre-aggregations/jobs endpoint, so that selector.preAggregations is an object which accepts a list of patitions:

Example:

"selector": {
      "contexts": [{ "securityContext": {} }],
      "timezones": ["UTC"],
      "preAggregations": [
          {
                "name":"orders.main",
                "partitions": [....]
          }
    }

Describe alternatives you've considered

There is no real alternative to build specific partitions.

Additional context

In our pipeline (DBT), we have the knowledge about which partition became stale, so that cube has to rebuild the pre-aggregation cache for this specific partition.

siamak-haschemi avatar Jun 14 '23 07:06 siamak-haschemi

Is this related? https://cube.dev/docs/guides/recipes/query-acceleration/refreshing-select-partitions

I'm thinking that triggering the refresh schedule via API might bypass the refresh key so isn't helpful as a workaround?

dfagnan avatar Aug 19 '23 16:08 dfagnan

If you are interested in working on this issue, please provide go ahead and provide PR for that. We'd be happy to review it and merge it. If this is the first time you are contributing a Pull Request to Cube, please check our contribution guidelines. You can also post any questions while contributing in the #contributors channel in the Cube Slack.

github-actions[bot] avatar Mar 26 '24 10:03 github-actions[bot]

Is this related? https://cube.dev/docs/guides/recipes/query-acceleration/refreshing-select-partitions

I'm thinking that triggering the refresh schedule via API might bypass the refresh key so isn't helpful as a workaround?

The link you sent might work for some use cases but in our case it doesn't. Today's data is actualized every 15 minutes via an ETL, and we also actualize the whole table (all partitions) once a day in the morning. Right now, only one refresh schedule is allowed by Cube so for consistency of the data, we choose to refresh the whole pre-aggregation (all the partitions) in the refresh schedule, so all the data are refreshed every 15 minutes. But only the last data change regularly so this is a waste of resources! I've seen that the cubejs-system API already have this implemented, because you can manually select and rebuild partitions of a pre-aggregation from the Cube Cloud UI. The idea would be to transform the "preAggregation" string array to an array of objects with "id" and "partitions". For example, for a pre-aggregation partitioned by day, the payload would be:

{
    "action": "post",
    "selector": {
      "contexts": [{ "securityContext": {} }],
      "timezones": ["UTC"],
      "preAggregations": [{
        "id": "orders.main",
        "partitions": ["prod_pre_aggregations.orders_main20240327"]
        }]
    }
}

as suggested by @siamak-haschemi above.

AmandineScopely avatar Mar 27 '24 14:03 AmandineScopely