platform Support caching by keeping cache volumes

Support caching by keeping cache volumes

Open MichaelRoeder opened this issue 7 years ago • 1 comments

Description

Some benchmarks need a large amount of their runtime to create the data necessary for their experiment. It would be easier if they have to create the data only once - especially when the same benchmark with the same configuration is executed several times in a short time frame, e.g., when executing a challenge. To enable this caching of generated data, we have to

define rules for a benchmark to use the caching by
- defining how a benchmark can request a cache volume,
- defining where a benchmark has to store its data if it should be cached and
- how the benchmark can figure out with which parameterization the data in the cache volume has been created.
decide whether a cached volume should be deleted or not since we won't be able to cache all data of the benchmarks forever. (refs #94)

Blocked by #95

Discussion

Should we have one cached volume for a benchmark and always mount the same volume? This could make it easier for us since we don't have to care whether the cache fits to the needs of the current parameterization and the benchmark has to decide whether it can reuse the data or not. On the other hand, this volume could become very large and might be deleted pretty fast.

Aug 22 '17 15:08 MichaelRoeder

A comment in regard to caching based on parametrization: The spring-batch framework uses the concept of identifying parameters to decide whether two job instances are equal for all practical purposes. For example, the amount of memory assigned to a database would be non-identifying, as it has no impact on what data will be loadaed into the database.

https://docs.spring.io/spring-batch/reference/htmlsingle/#domainJobParameters

In order to implement caching, I can think of a 'lazy-loading' approach, which I think would require introducing a new component to the platform:

The benchmark controller sends a request like the following to aDataPreparationService: I want to have a system from image name-of-system-of-vendor-x with config-x loaded with the data from data-generator-image-y with config-y.
Based on the indentifying parameters, the DataPreparationService decides whether such a container already exists, and if so, clones it (i.e. create a copy of the volumes), otherwise, it starts new containers of the system image + the data generator, runs the data generation and finally creates the 'cache container',
The system image must support being restarted and it must retain the data
This means, that the DataPreparationService would take over of some concerns which are currently handled by the BenchmarkController. Yet, it would be backward compatible in the sense, that old components simply do not use that service.
I think it is reasonable for the BenchmarkController to have the control over deciding which parameters are identifying.

Sep 21 '17 17:09 Aklakan

platform platform copied to clipboard

Support caching by keeping cache volumes

Description

Discussion

platform
platform copied to clipboard