bigcodebench
bigcodebench copied to clipboard
[Roadmap] BigCodeBench Q3 2024 Roadmap
trafficstars
This document includes the features of BigCodeBench Q3 2024. Please feel free to discuss and contribute, as this roadmap is shaped by the BigCodeBench community.
Help Wanted
- [x] Lingering processes #42
- [ ] Better documentation #40, #41
- [ ] Dataset Repair, e.g., #33
- [ ] Tests & CI/CD
- [x] Flexible Pass@k Support #50
Feature
- [x] #46, #36
- [x] Customized direct completion setup (to be released)
- [x] Catch up on the progress of EvalPlus
Dataset
- [ ] More investigations on the BigCodeBench tasks
Ongoing Research
- [ ] Benchmarking more languages -- Verilog & R, cc @shailja-thakur @ThreeCirclesK @marianna13
- [ ] Agentic Evaluation (proof-of-concept infra to be scaled up), cc @JoshuaPurtell
- [ ] Grounded Zero-Shot Tool Use, cc @terryyz @siviltaram