eval-dev-quality icon indicating copy to clipboard operation
eval-dev-quality copied to clipboard

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.

Results 83 eval-dev-quality issues
Sort by recently updated
recently updated
newest added

The v0.5.0 is mainly meant for introducing more variate. There are three main goals 1. Introduce more logical cases, to make sure that "better models" have a bigger difference in...

enhancement

### Tasks - [x] Introduce 2 assessment keys: - AssessmentKeyResponseCharacterCount - AssessmentKeyGenerateTestsForFileCharacterCount - [x] LLM model - File: `model/llm/llm.go` - Function: `GenerateTestsForFile` - [x] When parsing the model response, count...

enhancement

Merge #27 first so we can test this and refactor.

enhancement

@zimmski The regex to check for the temporary test directory does not work on Windows right now but I didn't want to further postpone the PR because of it. Would...

refactor

https://github.com/symflower/eval-dev-quality/actions/runs/9139604580/job/25132073770#step:9:841

bug
flaky
CI

We need a common helper to sandbox all the executions we are doing. Right now, an LLM could generate a remove-all-your-files call, and we just execute it.

enhancement
good first issue
help wanted

We want at least - goimports (maybe even https://github.com/mvdan/gofumpt) for formatting - https://github.com/dominikh/go-tools - https://github.com/mgechev/revive

enhancement
good first issue
help wanted