green-reviews-tooling
green-reviews-tooling copied to clipboard
[ACTION] Proposal 4: Benchmarking investigation
Task Description
Parent Issue https://github.com/cncf-tags/green-reviews-tooling/issues/83
This issue is about structuring the proposal about an investigation on the possible benchmarking strategies that we could choose.
Current state
What we did with Falco for our first benchmarking was let the end-user of the review (in this case the Falco project) choose their own benchmarking.
You can check the implementation details here there is a GitRepository ref to the repo that Falco set up to:
- Correctly deploy the Falco daemon with the needed config map
- Set up all the event generator and benchmarking framework like stress-ng and redis-benchmarking
In short all the benchmarking is in the hands of the Project that wants the review and this might not be ideal for the future.
Please also note that this current setup is a mix of benchmarking techniques like stress-ng framework and synthtetic data generator (this is due to the nature of Kepler requirement on the simulation environment).
Desired state
There are a couple of arguments why we should structure this a bit differently.
- We should control the benchmarking
Why?
- If every project sets up its own benchmarking it would not be easy to compare different green reviews. If we assume we keep the SCI score as an output metric, you can imagine that a score obtained from different benchmarking strategies might be difficult to compare.
- Another argument is that projects could in the future try to set up and ad-hoc benchmark that minimizes the carbon footprint to gain a better score and this is not ideal.
Some open questions:
- What kind of benchmarking do we need?
- What is a "good" objective of a benchmark? Should it reach some hardware target lime memory utlization or cpu utilization?
- How long should a benchmark run?
Some other considerations:
While it is good to have a standard approach with benchmarking, some projects (like Falco) might have some specific need for the benchmarking (e.g. Falco needed a given Kernel Event Rate to show production-like behavior).
- How do we handle such cases? Should we even allow this?
No brainer answer might be:
- We define a set of standard benchmarks
- We let the user configure an additional benchmark on top
But then we might fall into the case in which we don't have the same tests for all the projects. So what to do? This investigation proposal should produce a set of more fine grained investigation issues that could give us a direction for this investigation.
Goals to achieve
- [ ] Draft the proposal for the investigation
- [ ] Review proposal and merge
Nice to have
- [ ] Find experts in benchmarking