Work in progress: Implement CodeRAG-Bench benchmark
This is a ongoing draft on issue #1462: Implement CodeRAG-Bench benchmark.
CodeRAG-Bench is designed to enable rigorous evaluations and advance research on retrieval-augmented code generation.
https://code-rag-bench.github.io/
This benchmark has 7 separate coding tasks, and has separate evaluations for retrieval and generation processes. It also provides open-retrieval from 5 different corpus, so it takes some time to implement. This PR draft tracks my ongoing progress of implement CodeRAG-Benchmark.
Progress on April 26 Done:
- Created initial directory structure for CodeRAG benchmark; due to its complexity, it is placed in a separate folder rather than a single Python file
- Implemented basic logic for downloading and preprocessing the "humaneval" sub-task
- Canonical retrieval implementation for the "humaneval" sub-task
- Code evaluation logic for human eval
- Examples for humaneval
Ongoing: Editing docstrings to complete a MVP version that supports full RAG (canonical) + evaluation process for HumanEval subtask.
To be done:
- Support for the remaining 6 sub-tasks
- Generation support
- Open-corpus retrieval support
- test/examples
Description
Describe your changes in detail (optional if the linked issue already contains a detailed description of the changes).
Checklist
Go over all the following points, and put an x in all the boxes that apply.
- [ ] I have read the CONTRIBUTION guide (required)
- [ ] I have linked this PR to an issue using the Development section on the right sidebar or by adding
Fixes #issue-numberin the PR description (required) - [ ] I have checked if any dependencies need to be added or updated in
pyproject.tomlanduv lock - [ ] I have updated the tests accordingly (required for a bug fix or a new feature)
- [ ] I have updated the documentation if needed:
- [ ] I have added examples if this is a new feature
If you are unsure about any of these, don't hesitate to ask. We are here to help!
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Thank you @Wendong-Fan for your constructive feedback. I have updated the code accordingly, and now it is ready for review again.
I have moved this PR from my forked repo to the main repo (PR #2362), to allow smooth future collaboration and CI/CD test. I will close this PR for now. Thank you for your comments @Wendong-Fan @zjrwtx @sunchengxuanivy on #PR 2199. I have resolved comments from Wendong and will continue to work on @zjrwtx's comment.