cubed icon indicating copy to clipboard operation
cubed copied to clipboard

Predictive model to find optimal reduction parameters

Open balanz24 opened this issue 1 year ago • 4 comments

This is a possible solution to #418

Our model aims to predict the optimal split_every value that makes the reduction as fast as possible. This parameter affects the input data size of each function, the total number of stages and the number of functions x stage.

Evaluation has only be done in Lithops, but should be extended to further backends.

The model predicts 3 components of the execution time separately:

  • Invocation: The time it takes for the functions to start executing cubed code since the parallel map job is submited.
  • I/O: The time that functions spend reading and writing zarr files from/to object storage.
  • CPU: The cpu time that functions spend performing reduction computations.

Invocation and CPU times are easy to predict using linear regression, as they increase linearly as the dataset to reduce increases. As for the I/O time, it is predicted using the primula-plug presented in Primula paper.

Here we see a comparison of the real vs predicted times in a quadratic means test of 15 GB. This has been measured using lithops on AWS Lambda and S3.

As we can see the model is able to predict the optimal split_every=4 which gives the lowest execution time.

Some observations on the results:

  • Invocation overheads have a very significant weight over the total time, but further backends remain to be evaluated to see if they can be lower.
  • Since the CPU time seems to be insignificant, the model could be integrated into cubed only considering I/O and invocation overheads.

balanz24 avatar May 14 '24 13:05 balanz24