Evaluation of encoder and decoder models on SuperGLUE

Open fabiancpl opened this issue 3 months ago • 3 comments

Hi guys,

I want to evaluate models like ModernBERT, Llama and many others on SuperGLUE and my own benchmark. In my setting, every model has to be fine-tuned for the specific task, even decoder models.

Is this currently supported by LightEval? Looking at the code, my feeling is that evaluations are only done by prompting.

Thanks.

Sep 30 '25 09:09 fabiancpl

hey ! not sure i understand your use case, do you want to evaluate using logprobs ?

Nov 04 '25 11:11 NathanHB

Hi. No, I want to apply supervised fine-tuning (https://huggingface.co/docs/transformers/en/tasks/sequence_classification) on each model and then evaluate it on each task.

Nov 04 '25 12:11 fabiancpl

alright ! then you can do those two tasks seprately. First finetune your model, save the weights and upload it to the hub. Then evaluate your model as you would any other model on the hub

Nov 05 '25 09:11 NathanHB