eval-dev-quality icon indicating copy to clipboard operation
eval-dev-quality copied to clipboard

Interactive result comparison

Open bauersimon opened this issue 1 year ago • 0 comments

  • [ ] see list of solved tasks at a glance (similar to a JUnit html report)
    • [ ] including coverage count
    • [ ] or encountered errors
    • [ ] and any additional metrics we (will) collect (processing time, character count, ...)
    • [ ] with possibility to click and see the actual result (i.e. what the model generated)
  • [ ] diff/compare two model results
    • [ ] on a list basis (i.e. which tasks was which model able to solve and where are differences in coverage: character count, ...)
    • [ ] on a task basis (i.e. diffing the model results 1:1)
    • [ ] maybe also the ability to switch between different results from different runs (i.e. compare plain.go result from model A's run 2 against plain.go result from model B's result 4)

bauersimon avatar Jun 21 '24 08:06 bauersimon