Changes to support pass@k evaluation on the HumanEval dataset

Open shubhra opened this issue 2 years ago • 0 comments

Example:

numactl -C0-15 python deepsparse/src/deepsparse/transformers/eval_downstream.py \
        <model_path>\
        --num-cores 16 \
        --dataset openai_humaneval \
        --humaneval-method pass_at_k \
        --engine deepsparse \
        --start 0 \
        --max-samples 2

This will create a subset of the HumanEval dataset starting at index 0 (start) and pick 2 samples (max-samples) to run the evaluation on.
If benchmark-humaneval argument is supplied, the evaluation will run on a pre-selected smaller subset of the dataset that contains 11 samples and will ignore start and max-samples.
Set humaneval-method to perplexity to evaluate perplexity instead of pass@k.
Add --n-solutions <n> to specify the number of solutions required per task . Default is 1.

Note: Remove numactl -C0-15 if you don't need to specify which cores to run on.

Aug 11 '23 14:08 shubhra