OpenHands Logging fixes and improvements

This PR proposes a few fixes and improvements to logging:

[x] clean up ANSI characters from the file logs
[x] integrate llm logging with event stream sessions - for restored sessions, keep new logs in the same place
[x] allow logging of prompts/responses with sid (e.g. for evals)
[x] fix useless dir creation in some cases ~allow to disable/enable separately the runtime/docker logging (it's spammy again, for good reason but still spammy 😅)~ (split out, in follow up on hierarchical logging)
[x] refactor older logging to f-strings (initially all calls used old-style %s). There might be some performance cost, but we're using f-strings for a long time anyway, and I'd suggest we can deal with it if or when it happens.

Aug 16 '24 03:08 enyst

I noticed you replaced logger arguments with string interpolation. While IMHO string interpolation is easier to read, logger arguments can give other advantages, such as not having to interpolate strings at all depending on the log level and underlying implementation.

Is this a pattern we intend to use moving forward?

Aug 16 '24 18:08 tofarr

I noticed you replaced logger arguments with string interpolation. While IMHO string interpolation is easier to read, logger arguments can give other advantages, such as not having to interpolate strings at all depending on the log level and underlying implementation.

Is this a pattern we intend to use moving forward?

Hey, indeed, I think there is some performance cost, in theory. The funny history here that when we initially implemented the logging system, it had only old-style %s formatting precisely for the reason you identified: to avoid computing the string when it won't be logged anyway due to the log level.

However, after that, in time, the codebase got f-strings all the time. Nobody liked the old-style! 😅

Really, as you can see here I found only a few occurrences. Apparently these were the only ones left. 🤷 I frankly thought I'll find more, but no, they're f-strings already. (and I've definitely sinned too!)

Yes, IMHO it's time to admit we can't swim against this tide, too many people do it this way for too many years.. So my suggestion is to go the other way around: use nice syntax; then if and when we will see performance issues tracked to it, we can reconsider using lazy evaluation on a case by case basis - if we find something bad enough.

Aug 16 '24 19:08 enyst

Hi @enyst. Is getting this PR merged only a matter of resolving the conflicts?

Sep 07 '24 17:09 mamoodi

Hey @mamoodi ! Thanks for checking in. It's a little more than just conflicts: I want to see that it works to do LLM logs during evals, with multiple processes and custom locations. I always hack that in on my machine, so I need it, you know, cleanly enough for main.

Particularly relevant for the PRs that modify history, like the memGPT one, because in that case it can happen that what it actually sends to the LLM is not exactly what we see in the regular trajectories.

I'll fix conflicts and then I will see if it needs to be here or can split.

Sep 07 '24 18:09 enyst

This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Oct 26 '24 01:10 github-actions[bot]