Request for Generation Parameters and Benchmark Setup Details
Hello,
I am trying to reproduce the benchmark results mentioned in the Qwen2.5-Coder technical report. However, I couldn’t find detailed information about the generation parameters (e.g., temperature, top-k, top-p, num beams etc...) or the specific setup used for the benchmarks specifically for HumanEval on the 7B model.
Could you please provide more details about the configurations and settings used during the evaluation?
Thank you for your help!
For most evaluations, we adopt a greedy decoding. You can find all the evaluation details in our evaluation scripts.
https://github.com/QwenLM/Qwen2.5-Coder/tree/main/qwencoder-eval/instruct
Thank you! I'll check that out