eval-dev-quality icon indicating copy to clipboard operation
eval-dev-quality copied to clipboard

Keep individual coverage files and LLM query/responses

Open zimmski opened this issue 1 year ago • 3 comments

We need to keep all interactions. That includes the coverage files we are collecting.

zimmski avatar Jun 19 '24 11:06 zimmski

What about https://github.com/symflower/eval-dev-quality/issues/181 then? Close?

bauersimon avatar Jun 20 '24 13:06 bauersimon

I think the cleanest solution would be to use logrus "Hooks". That way we can keep most of our logging as is, but i.e. log prompts with a special type=prompt attribute and add a hook to the logging that also writes the prompt content into a separate file.

bauersimon avatar Jun 27 '24 10:06 bauersimon

Planning

Introduce structural logging, to have a single place where artifacts like model responses or coverage files are saved on disk. Via structural logging we can define keys/attributes like model, repository, task, etc. which than defines how we log and save artifacts.

Which logging library?

After some research there are two candidates, https://github.com/sirupsen/logrus and https://pkg.go.dev/golang.org/x/exp/slog. Logrus has "hooks" to act on entries with specific attributes, and for slog one needs to implement a custom "Handler". Since we have a hierarchical logging structure, the "Handler" approach is preferable, since the handler then decides when it is necessary to log into a new file and there can be a hierarchy of handlers. When using hooks, a hook would need to manage several files at once, like one for every model.

Tasks

  • [x] Switch to new logging library slog (without changing the current logging behavior)
    • [x] Set attributes like model, task, repository, etc accordingly
  • [ ] Write artifacts like model responses to disk.
    • [x] LLM responses
    • [ ] Coverage files
    • [ ] ...

ahumenberger avatar Jul 04 '24 09:07 ahumenberger

This has been implemented

zimmski avatar Jan 14 '25 11:01 zimmski