platform
platform copied to clipboard
Support caching by keeping cache volumes
Description
Some benchmarks need a large amount of their runtime to create the data necessary for their experiment. It would be easier if they have to create the data only once - especially when the same benchmark with the same configuration is executed several times in a short time frame, e.g., when executing a challenge. To enable this caching of generated data, we have to
- define rules for a benchmark to use the caching by
- defining how a benchmark can request a cache volume,
- defining where a benchmark has to store its data if it should be cached and
- how the benchmark can figure out with which parameterization the data in the cache volume has been created.
- decide whether a cached volume should be deleted or not since we won't be able to cache all data of the benchmarks forever. (refs #94)
Blocked by #95
Discussion
- Should we have one cached volume for a benchmark and always mount the same volume? This could make it easier for us since we don't have to care whether the cache fits to the needs of the current parameterization and the benchmark has to decide whether it can reuse the data or not. On the other hand, this volume could become very large and might be deleted pretty fast.
A comment in regard to caching based on parametrization: The spring-batch framework uses the concept of identifying parameters to decide whether two job instances are equal for all practical purposes. For example, the amount of memory assigned to a database would be non-identifying, as it has no impact on what data will be loadaed into the database.
https://docs.spring.io/spring-batch/reference/htmlsingle/#domainJobParameters
In order to implement caching, I can think of a 'lazy-loading' approach, which I think would require introducing a new component to the platform:
- The benchmark controller sends a request like the following to a
DataPreparationService
: I want to have a system from imagename-of-system-of-vendor-x
withconfig-x
loaded with the data fromdata-generator-image-y
withconfig-y
. - Based on the indentifying parameters, the
DataPreparationService
decides whether such a container already exists, and if so, clones it (i.e. create a copy of the volumes), otherwise, it starts new containers of the system image + the data generator, runs the data generation and finally creates the 'cache container', - The system image must support being restarted and it must retain the data
- This means, that the DataPreparationService would take over of some concerns which are currently handled by the BenchmarkController. Yet, it would be backward compatible in the sense, that old components simply do not use that service.
- I think it is reasonable for the BenchmarkController to have the control over deciding which parameters are identifying.