cube icon indicating copy to clipboard operation
cube copied to clipboard

Cube Refresh Worker - Running as a lambda/As a scheduled job - delineation of schedule

Open strazto opened this issue 4 months ago • 3 comments

Problem

Cube Refresh worker's resource consumption is very bursty. When it runs its' periodic refresh, (roughly hourly) it requires a very high amount of compute resources. This "active" period tends to be brief.

The rest of the time, it requires far less compute resources, appearing "idle".

When provisioning the cube stack, the refresh worker requires a relatively massive amount of CPU + RAM. This is mainly required to support the brief "active" period of the duty cycle.

Intention

Ideally, I'd like to be able to spin up the expensive refresh operation as a lambda / on demand job, following a cron schedule, so that I don't have to have a large amount of compute provisioned during the "idle" period.

Assuming that nothing is happening during the "idle" period, this should be easy to do- Run the expensive job, when it terminates, spin down the pod.

Question

During the "idle" period, is the refresh worker still performing operations?

  • The documentation

    A Refresh Worker updates pre-aggregations and invalidates the in-memory cache in the background. They also keep the refresh keys up-to-date for all data models and pre-aggregations. Please note that the in-memory cache is just invalidated but not populated by Refresh Worker. In-memory cache is populated lazily during querying. On the other hand, pre-aggregations are eagerly populated and kept up-to-date by Refresh Worker.

    delineates that the in-memory cache is NOT populated by the refresh worker, but instead lazily at runtime by some other service.

    • The documentation states that the refresh worker invalidates the cache. and "eagerly" populates pre-aggs.
    • It's not clear whether this "eager" operation is referring to the "active" period of the duty cycle.

Is there a way to detect when the "active" period has ended?

Ie, the refresh job has finished and the pod can spin down

strazto avatar Mar 05 '24 05:03 strazto

Hi @strazto 👋

Cube Refresh worker's resource consumption is very bursty. When it runs its' periodic refresh, (roughly hourly) it requires a very high amount of compute resources. This "active" period tends to be brief.

You have full control over the refresh keys and you're not constrained to having every refresh key defined as "every 1 hour". You can use whatever schedule works for you, including using cron-based schedules: https://cube.dev/docs/reference/data-model/cube#refresh_key

Run the expensive job, when it terminates, spin down the pod

As specified in the docs that you're quoting, Refresh Worker is indeed active invalidating the cache entries even when it's not building the pre-aggregations. The current Cube architecture needs to have it active at all times.

I hope this helps.

igorlukanin avatar Mar 20 '24 15:03 igorlukanin

@strazto Did my advice above help?

igorlukanin avatar May 14 '24 10:05 igorlukanin

@igorlukanin your advice was helpful thank you 🙂

strazto avatar May 14 '24 13:05 strazto