camel icon indicating copy to clipboard operation
camel copied to clipboard

Work in progress: Implement CodeRAG-Bench benchmark

Open boerz-coding opened this issue 8 months ago • 1 comments

This is a ongoing draft on issue #1462: Implement CodeRAG-Bench benchmark.

CodeRAG-Bench is designed to enable rigorous evaluations and advance research on retrieval-augmented code generation.

https://code-rag-bench.github.io/

This benchmark has 7 separate coding tasks, and has separate evaluations for retrieval and generation processes. It also provides open-retrieval from 5 different corpus, so it takes some time to implement. This PR draft tracks my ongoing progress of implement CodeRAG-Benchmark.

Progress on April 26 Done:

  • Created initial directory structure for CodeRAG benchmark; due to its complexity, it is placed in a separate folder rather than a single Python file
  • Implemented basic logic for downloading and preprocessing the "humaneval" sub-task
  • Canonical retrieval implementation for the "humaneval" sub-task
  • Code evaluation logic for human eval
  • Examples for humaneval

Ongoing: Editing docstrings to complete a MVP version that supports full RAG (canonical) + evaluation process for HumanEval subtask.

To be done:

  • Support for the remaining 6 sub-tasks
  • Generation support
  • Open-corpus retrieval support
  • test/examples

Description

Describe your changes in detail (optional if the linked issue already contains a detailed description of the changes).

Checklist

Go over all the following points, and put an x in all the boxes that apply.

  • [ ] I have read the CONTRIBUTION guide (required)
  • [ ] I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
  • [ ] I have checked if any dependencies need to be added or updated in pyproject.toml and uv lock
  • [ ] I have updated the tests accordingly (required for a bug fix or a new feature)
  • [ ] I have updated the documentation if needed:
  • [ ] I have added examples if this is a new feature

If you are unsure about any of these, don't hesitate to ask. We are here to help!

boerz-coding avatar Apr 16 '25 06:04 boerz-coding

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Thank you @Wendong-Fan for your constructive feedback. I have updated the code accordingly, and now it is ready for review again.

boerz-coding avatar May 04 '25 06:05 boerz-coding

I have moved this PR from my forked repo to the main repo (PR #2362), to allow smooth future collaboration and CI/CD test. I will close this PR for now. Thank you for your comments @Wendong-Fan @zjrwtx @sunchengxuanivy on #PR 2199. I have resolved comments from Wendong and will continue to work on @zjrwtx's comment.

boerz-coding avatar May 08 '25 18:05 boerz-coding