eval-dev-quality
eval-dev-quality copied to clipboard
DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
Part of #128
The v0.5.0 is mainly meant for introducing more variate. There are three main goals 1. Introduce more logical cases, to make sure that "better models" have a bigger difference in...
### Tasks - [x] Introduce 2 assessment keys: - AssessmentKeyResponseCharacterCount - AssessmentKeyGenerateTestsForFileCharacterCount - [x] LLM model - File: `model/llm/llm.go` - Function: `GenerateTestsForFile` - [x] When parsing the model response, count...
@zimmski The regex to check for the temporary test directory does not work on Windows right now but I didn't want to further postpone the PR because of it. Would...
https://github.com/symflower/eval-dev-quality/actions/runs/9139604580/job/25132073770#step:9:841
We need a common helper to sandbox all the executions we are doing. Right now, an LLM could generate a remove-all-your-files call, and we just execute it.
We want at least - goimports (maybe even https://github.com/mvdan/gofumpt) for formatting - https://github.com/dominikh/go-tools - https://github.com/mgechev/revive