Feat. Benchmark Sets

Open farook-edev opened this issue 1 month ago • 6 comments

This PR adds a new element (BenchmarkSet) which bundles together benchmarks that are mostly similar but need to be run separately (i.e. different models or datasets but same function).

Under the hood the benchmarks work exactly the same, no C++ logic has been changed. The added configuration is only for the frontend.

The way it works is by bundling similar benchmarks under a set, and having each benchmark be active if one or more options it requires are active. For example, if we take LLM, let's say we have 3 models and 3 dataset implementations to test, ModelA-DatasetB, ModelC-DatasetA, and so on, that'll be 9 benchmarks. Benchmark ModelA-DatasetC will define 2 required options, Model-A and Dataset-C, then the Benchmark Set will contain 6 options in 2 categories, Models (A,B,C) and DataSets (A,B,C). If a user then enables Models A and C, And dataset A. the set will automatically activate ModelA-DatasetA and ModelC-DatasetA and disable all the others.

The benefit from this approach is that instead of having 9 benchmarks that are basically the same, we'll have 1 set containg 6 options. While the core benchmarking will not see the sets or options.

This PR also applies the above described implementation to image_classification_v2, combining the default and offline versions into a set, and providing 2 options to enable and disable the benchmarks. This is only a secondary improvements, since this system is meant to tidy up the (at least) 4 benchmarks that LLM will add.

I've also included a video of the system in action:

https://github.com/user-attachments/assets/9c833086-60fc-4d6f-a5bd-bf1bb10cab0a

Closes #1082

Jan 06 '26 00:01 farook-edev