eval-dev-quality
eval-dev-quality copied to clipboard
Track how many characters were present in code part / complete response
Tasks
- [x] Introduce 2 assessment keys:
- AssessmentKeyResponseCharacterCount
- AssessmentKeyGenerateTestsForFileCharacterCount
- [x] LLM model
- File:
model/llm/llm.go - Function:
GenerateTestsForFile - [x] When parsing the model response, count the number of characters of the response and the test content
- [x] Add it to the assessments
- File:
- [x] Symflower
- File:
model/symflower/symflower.go - Function:
GenerateTestsForFile - [x] Get the output of the
symflower unit-testscommand - [x] Parse the output and return the list of test files generated
- [x] For each generated file, read its content and count the characters
- Note:
AssessmentKeyResponseCharacterCount == AssessmentKeyGenerateTestsForFileCharacterCount
- Note:
- File:
AssessmentKeyCharacterCountResponse AssessmentKeyCharacterCountGeneratedTests
Just looked into what we have right now. Let's do the following to better group assessments:
AssessmentKeyResponseCharacterCount AssessmentKeyGenerateTestsForFileCharacterCount
Test it
I think you can always drop that task. Should go without saying. Every task you do should be tested.