jahs_bench_201 Issue with evaluating multiobjective benchmarks

Issue with evaluating multiobjective benchmarks

Open thchang opened this issue 1 year ago • 4 comments

Hi all -- I am interested in running the multiobjective variation of this benchmark and encountered the following issues, please advise:

For the hypervolume metric to be comparable across different solvers, everyone needs to use the same reference point. Therefore, in https://automl.github.io/jahs_bench_201/evaluation_protocol you must provide a recommended reference point for everyone to measure hypervolume with respect to. The reference point must be a finite performance value for each objective for which it is impossible to do worse.

I looked up the reference point for your reported results for the multiobjective random sampling approach, and in line 19-20 of this file: https://github.com/automl/jahs_bench_201_experiments/blob/master/jahs_bench_201_experiments/analysis/leaderboard.py and it looks like you are calculating the reference point as the minimum of all observed values. Is that correct? Such a policy would indirectly reward methods that sample very bad configurations and penalize methods that never take a "bad" evaluation.

Also in the same file, it looks like the 2 objectives for the multiobjective variation of the benchmarks are to maximize validation accuracy + minimize latency? It would be nice to clarify this somewhere in https://automl.github.io/jahs_bench_201/evaluation_protocol as well, as it was difficult to find.

Apr 26 '23 22:04 thchang

jahs_bench_201 jahs_bench_201 copied to clipboard

Issue with evaluating multiobjective benchmarks

jahs_bench_201
jahs_bench_201 copied to clipboard