eval-dev-quality icon indicating copy to clipboard operation
eval-dev-quality copied to clipboard

Track how many characters were present in code part / complete response

Open bauersimon opened this issue 1 year ago • 2 comments

Tasks

  • [x] Introduce 2 assessment keys:
    • AssessmentKeyResponseCharacterCount
    • AssessmentKeyGenerateTestsForFileCharacterCount
  • [x] LLM model
    • File: model/llm/llm.go
    • Function: GenerateTestsForFile
    • [x] When parsing the model response, count the number of characters of the response and the test content
    • [x] Add it to the assessments
  • [x] Symflower
    • File: model/symflower/symflower.go
    • Function: GenerateTestsForFile
    • [x] Get the output of the symflower unit-tests command
    • [x] Parse the output and return the list of test files generated
    • [x] For each generated file, read its content and count the characters
      • Note: AssessmentKeyResponseCharacterCount == AssessmentKeyGenerateTestsForFileCharacterCount

bauersimon avatar May 17 '24 13:05 bauersimon

AssessmentKeyCharacterCountResponse AssessmentKeyCharacterCountGeneratedTests

Just looked into what we have right now. Let's do the following to better group assessments:

AssessmentKeyResponseCharacterCount AssessmentKeyGenerateTestsForFileCharacterCount

zimmski avatar May 24 '24 09:05 zimmski

Test it

I think you can always drop that task. Should go without saying. Every task you do should be tested.

zimmski avatar May 24 '24 09:05 zimmski