transformerlab-app icon indicating copy to clipboard operation
transformerlab-app copied to clipboard

Evals not showing grouping properly

Open aliasaria opened this issue 5 months ago • 1 comments

  • Run hellaswag,piqa,winogrande using common-eleuther-ai-lm-eval-harness-mlx
  • see attached video: some rows have three reporting metrics, some two, some 1

https://github.com/user-attachments/assets/4079a238-87a6-4ff2-b0d7-c7d0840b6386

aliasaria avatar Jul 10 '25 18:07 aliasaria

Adding a note here that this only happens when we do comparison with harness as each metric has its own test set which is varying in number. The actual task to solve for this is to determine which eval reports should be grouped and which shouldn't based on the plugin

deep1401 avatar Jul 10 '25 18:07 deep1401