camel Work in progress: Implement CodeRAG-Bench benchmark

This is a ongoing draft on issue #1462: Implement CodeRAG-Bench benchmark.

CodeRAG-Bench is designed to enable rigorous evaluations and advance research on retrieval-augmented code generation.

https://code-rag-bench.github.io/

This benchmark has 7 separate coding tasks, and has separate evaluations for retrieval and generation processes. It also provides open-retrieval from 5 different corpus, so it takes some time to implement. This PR draft tracks my ongoing progress of implement CodeRAG-Benchmark.

Progress on April 26 Done:

Created initial directory structure for CodeRAG benchmark; due to its complexity, it is placed in a separate folder rather than a single Python file
Implemented basic logic for downloading and preprocessing the "humaneval" sub-task
Canonical retrieval implementation for the "humaneval" sub-task
Code evaluation logic for human eval
Examples for humaneval

Ongoing: Editing docstrings to complete a MVP version that supports full RAG (canonical) + evaluation process for HumanEval subtask.

To be done:

Support for the remaining 6 sub-tasks
Generation support
Open-corpus retrieval support
test/examples

Description

Describe your changes in detail (optional if the linked issue already contains a detailed description of the changes).

Checklist

Go over all the following points, and put an x in all the boxes that apply.

[ ] I have read the CONTRIBUTION guide (required)
[ ] I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
[ ] I have checked if any dependencies need to be added or updated in pyproject.toml and uv lock
[ ] I have updated the tests accordingly (required for a bug fix or a new feature)
[ ] I have updated the documentation if needed:
[ ] I have added examples if this is a new feature

If you are unsure about any of these, don't hesitate to ask. We are here to help!

Apr 16 '25 06:04 boerz-coding

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Apr 26 '25 05:04 review-notebook-app[bot]

Thank you @Wendong-Fan for your constructive feedback. I have updated the code accordingly, and now it is ready for review again.

May 04 '25 06:05 boerz-coding

I have moved this PR from my forked repo to the main repo (PR #2362), to allow smooth future collaboration and CI/CD test. I will close this PR for now. Thank you for your comments @Wendong-Fan @zjrwtx @sunchengxuanivy on #PR 2199. I have resolved comments from Wendong and will continue to work on @zjrwtx's comment.

May 08 '25 18:05 boerz-coding