LooGLE icon indicating copy to clipboard operation
LooGLE copied to clipboard

Prompt format for different models

Open Mooler0410 opened this issue 1 year ago • 2 comments

Hi! I have read the codes for open source model evaluation. I noticed that, different from some existing benchmarks such as LongBench or L-Eval, there is not prompt customization part for different models (e.g. the prompt format of vicuna series is different from the original LlaMa-2). For fair comparison, do you think such customization should be added to the codes?

Mooler0410 avatar Dec 16 '23 10:12 Mooler0410

Hi, we agree that dedicately customizated prompting for different tasks helps. Since it can discover the potential of models through more standardized output formats to get better performance when assessment.

As far as we know, LongBench desinged different instructions for different datasets/tasks instead of models. In our case, we select the most popular and common NLP tasks (summarization, QA) for evaluation. There are no strict requirements on the output format of these tasks, while we indeed design prompts adaptively for cloze tasks for fair comparison.

lijiaqijane avatar Jan 17 '24 10:01 lijiaqijane

Thanks for clarification!

Mooler0410 avatar Jan 17 '24 19:01 Mooler0410