alignment-handbook
alignment-handbook copied to clipboard
Add instrutions to evaluate on academic datasets
The paper evaluates on ARC, HellaSwag, MMLU, and TruthfulQA, but this repo does not reference these evals. Adding short explanation regarding these evals (e.g., in https://github.com/huggingface/alignment-handbook/tree/main/scripts#evaluating-chat-models) would be nice