Koan-Sin Tan

Results 251 comments of Koan-Sin Tan

* select the benchmark, not datasets (they should not be selectable) from UI. e.g., assuming we have both ifeval and tinymmlu as planned, they are not supposed to be electable...