eyuansu62

Results 24 issues of eyuansu62

Will open source the evaluation benchmark?

Great work! I am struggling to comprehend the mechanism of this segment: ''' Here are some examples of prompts you provide: @@example prompt1@@ &&category1&& @@example prompt2@@ &&category2&& ··· @@example prompt9@@...

When I run the 'mmlu_generative' dataset, the group score does not produce any output. ``` |Groups|Version|Filter|n-shot|Metric| |Value| |Stderr| |------|-------|------|------|------|---|-----|---|------| ```

We find the following bugs of mmlu: 1. The doc_to_text function in mmlu_flan_cot_fewshot is incorrect, causing the few-shot data not aligning with the pre-defined examples specified in the fewshot_config for...