[ACTION] Proposal 4: Benchmarking investigation

Open AntonioDiTuri opened this issue 8 months ago • 0 comments

Task Description

Parent Issue https://github.com/cncf-tags/green-reviews-tooling/issues/83

This issue is about structuring the proposal about an investigation on the possible benchmarking strategies that we could choose.

Current state

What we did with Falco for our first benchmarking was let the end-user of the review (in this case the Falco project) choose their own benchmarking.

You can check the implementation details here there is a GitRepository ref to the repo that Falco set up to:

Correctly deploy the Falco daemon with the needed config map
Set up all the event generator and benchmarking framework like stress-ng and redis-benchmarking

In short all the benchmarking is in the hands of the Project that wants the review and this might not be ideal for the future.

Please also note that this current setup is a mix of benchmarking techniques like stress-ng framework and synthtetic data generator (this is due to the nature of Kepler requirement on the simulation environment).

Desired state

There are a couple of arguments why we should structure this a bit differently.

We should control the benchmarking

Why?

If every project sets up its own benchmarking it would not be easy to compare different green reviews. If we assume we keep the SCI score as an output metric, you can imagine that a score obtained from different benchmarking strategies might be difficult to compare.
Another argument is that projects could in the future try to set up and ad-hoc benchmark that minimizes the carbon footprint to gain a better score and this is not ideal.

Some open questions:

What kind of benchmarking do we need?
What is a "good" objective of a benchmark? Should it reach some hardware target lime memory utlization or cpu utilization?
How long should a benchmark run?

Some other considerations:

While it is good to have a standard approach with benchmarking, some projects (like Falco) might have some specific need for the benchmarking (e.g. Falco needed a given Kernel Event Rate to show production-like behavior).

How do we handle such cases? Should we even allow this?

No brainer answer might be:

We define a set of standard benchmarks
We let the user configure an additional benchmark on top

But then we might fall into the case in which we don't have the same tests for all the projects. So what to do? This investigation proposal should produce a set of more fine grained investigation issues that could give us a direction for this investigation.

Goals to achieve

[ ] Draft the proposal for the investigation
[ ] Review proposal and merge

Nice to have

[ ] Find experts in benchmarking

Jun 05 '24 08:06 AntonioDiTuri

green-reviews-tooling green-reviews-tooling copied to clipboard

[ACTION] Proposal 4: Benchmarking investigation

Task Description

Current state

Desired state

Why?

Some open questions:

Some other considerations:

Goals to achieve

Nice to have

green-reviews-tooling
green-reviews-tooling copied to clipboard