eyuansu62

Results 24 issues of


                                            eyuansu62

Evaluation benchmark

Will open source the evaluation benchmark?

Question about the prompt.

Great work! I am struggling to comprehend the mechanism of this segment: ''' Here are some examples of prompts you provide: @@example prompt1@@ &&category1&& @@example prompt2@@ &&category2&& ··· @@example prompt9@@...

Bug about the output information.

When I run the 'mmlu_generative' dataset, the group score does not produce any output. ``` |Groups|Version|Filter|n-shot|Metric| |Value| |Stderr| |------|-------|------|------|------|---|-----|---|------| ```

fix some bugs of mmlu

1

comment

We find the following bugs of mmlu: 1. The doc_to_text function in mmlu_flan_cot_fewshot is incorrect, causing the few-shot data not aligning with the pre-defined examples specified in the fewshot_config for...

‹
1
2
3