Clémentine Fourrier
Clémentine Fourrier
Theoretically everything up to 1.0.0 is considered unstable and prone to change at any time: > Major version zero (0.y.z) is for initial development. Anything MAY change at any time....
If the tokenizer prepends `_` as sow token, it will make single token evals fail. Reported by @anton-l
At the moment, we support chat templates (need to be edited for multichoice), but not CoT. Could be cool to add.
Atm, following the harness, TruthfulQA hardcodes the few shot samples. We should instead reupload the dataset with the few shot samples on the side, and use our normal mechanism for...
- Add more docs - Move os.environ["TOKENIZERS_PARALLELISM"] = "false" to the main scripts.
See for example https://github.com/wiskojo/lm-evaluation-harness/blob/60c3d381b893b164be0d919d3e9992a6c0fe6ce3/lm_eval/tasks/ifeval/instructions.py
Hi, What's the license of your library?
Ready for a light review
This PR does 2 things: - introduce a programmatic interface with a Pipeline object which should allow users to call the models more easily (also removes the evaluator since most...