How can the result be reformed into the one for SWE-bench evaluation?
Very wonderful work. I notice that swe-bench evaluation requires files including
eval.sh: The evaluation script
patch.diff: The model's generated prediction
report.json: Summary of evaluation outcomes for this instance
run_instance.log: A log of SWE-bench evaluation steps
test_output.txt: An output of running eval.sh on patch.diff
And in auto code rover we only get the json and patch.diff how can we get test_output.txt?
Thanks a lot!
Hi! You would need to first transform the json into jsonl (with a simple python script for example), then evaluate the jsonl with SWE-bench's containerized evaluation. Then in SWE-bench/logs/ you will find these files.
Hi @crhf, When I run AutoCodeRover on SWE-lite ( using docker image). I receive a file predictions_for_swebench.json
You mean using this file --> transform to jsonl --> evalute with SWE-bench containerized evaluation For example:
python -m swebench.harness.run_evaluation \
--dataset_name princeton-nlp/SWE-bench_Lite \
--predictions_path **predictions_for_swebench.jsonl**\
--max_workers 1
--run_id evalution
the field --predictions will be predictions_for_swebench.jsonl. Is it correct ?