Easy-Transformer
Easy-Transformer copied to clipboard
Add helper function to run HuggingFace evals on HookedTransformer
Pick some example evals from here (eg PIQA, TriviaQA, LAMBADA) and write code to run HookedTransformer on them: https://huggingface.co/docs/evaluate/index
A demo notebook doing this for specific benchmarks would be a good MVP, bonus is doing a generic function for any eval (or eg for multiple choice evals, vs other types).
I think https://github.com/EleutherAI/lm-evaluation-harness would be a good place to start here. Anyone doing this should be aware that this is going to be refactored: https://www.youtube.com/watch?v=6qDOUeQTp0I so should probably chat to Eleuther people
@ArthurConmy Do you know if that refactoring has been done? could someone have a go at this now?
I don't think the refactor is done.
I guess HF Evals are different fromt the way lm-eval-harness downloads datasets, maybe we should do this?
All three datasets mentioned are in lm-eval-harness though. I propose that we add lm-eval-harness as an optional additional dependency (e.g so pip install transformer-lens[evals] installs it) and we have a way to pass a HookedTransformers to eval-harness (currently it only support HF AutoModelForCausalLMs I think)