Evaluation of encoder and decoder models on SuperGLUE
Hi guys,
I want to evaluate models like ModernBERT, Llama and many others on SuperGLUE and my own benchmark. In my setting, every model has to be fine-tuned for the specific task, even decoder models.
Is this currently supported by LightEval? Looking at the code, my feeling is that evaluations are only done by prompting.
Thanks.
hey ! not sure i understand your use case, do you want to evaluate using logprobs ?
Hi. No, I want to apply supervised fine-tuning (https://huggingface.co/docs/transformers/en/tasks/sequence_classification) on each model and then evaluate it on each task.
alright ! then you can do those two tasks seprately. First finetune your model, save the weights and upload it to the hub. Then evaluate your model as you would any other model on the hub