starcoder
starcoder copied to clipboard
Is there a script for evaluating against eleutherAI’s language model evaluation harness?
Hello, I want to reproduce the lm evaluation harness results reported in the blog. Since the prompts need to be formatted with the user, assistant, system, end tokens, the evaluation harness does not work out of the box. I'm wondering if the team can share the script used to report the results in the table!