Evaluation of encoder and decoder models on SuperGLUE

Open fabiancpl opened this issue 3 months ago • 0 comments

Hi guys,

I want to evaluate models like ModernBERT, Llama and many others on SuperGLUE and my own benchmark. In my setting, every model has to be fine-tuned for the specific task, even decoder models.

Is this currently supported by Harness? Looking at the code, my feeling is that evaluations are only done by prompting.

Thanks.

Sep 30 '25 09:09 fabiancpl