elk-generalization
elk-generalization copied to clipboard
Support logging of accuracy for non-constant answer choices
Huggingface evaluation doesn't allow passing of the answer choices to the evaluation metric function, so we are currently asserting that the answer choices for all examples are the same.