team-compass icon indicating copy to clipboard operation
team-compass copied to clipboard

Align our string formatting practices when using log functions

Open consideRatio opened this issue 2 years ago • 6 comments

In https://github.com/jupyterhub/jupyterhub-idle-culler/pull/58 I did some core maintenance, transitioning towards f-strings in general but as @manics pointed out, @Carreau and others make good points in https://github.com/jupyter-server/jupyter_server/issues/1186 about not using f-strings for logging function calls.


here are two examples of why, but there was also a security point raised that I've not considered myself yet.

https://github.com/jupyter-server/jupyter_server/issues/1186#issue-1552791320:

This prevent structured logging and post-processing, as the f-string is formatted before being passed to log.

The goal of log taking parameter if for for self.log.level('format string %s, %r', param_1, param_2 ), is that you can store the log in a format where you you actually ask "give me all the logs which format strings are 'format string %s, %r'" without having to parse text. And this allow handler like Rich, for example to do proper syntax highlighting, or allow compressed storage of logs indexed by the message.

https://github.com/jupyter-server/jupyter_server/issues/1186#issuecomment-1400523711:

:+1: also much better for performance With parameters, if no logging is done, no string formatting is done, while with f"" it will always be done.


Question to answer

Do we want to transition away from rendering strings ourselves using f-strings before we pass them to logging functions, but instead rely on the logging functions to do so for us by passing arguments and a string to be formatted with those arguments?

consideRatio avatar Feb 20 '23 11:02 consideRatio

I happen to like f-strings for almost everything except debug logs. The performance issue of delayed rendering is definitely relevant for debug logs (folks also often put the log statement itself behind an if), but if performance is the main concern, f-strings are actually preferable if your logs are visible most of the time (it's very rare for jupyterhub logs to be at a level other than info or debug, so f-strings for all log statements info-or-higher are going to be preferable in almost all cases).

The safety issue is only relevant for combining f-strings with unsanitized inputs and lazily-rendered arguments. If you don't have all 3, there are no safety issues (i.e. f-strings or lazy evaluation for a given message are equally safe). So it's definitely true that you shouldn't have log.info(f"{somethign} %s", something_else), but log.info(f"{something}") is just fine.

That mostly leaves the custom logging system features like indexing and formatting that can benefit from the format-string approach.

So we have to weigh: is the improved simplicity, clarity, and performance of using f-strings worth trading for improved support with Python logging extensions? I don't really know, because I'm not aware of any use of such extensions within JupyterHub. Most realistic JupyterHub deployments have to deal with the fact that logging comes from many sources, not just JupyterHub or even not just Python, so index/search has to work on raw logs anyway, and any Python-specific tool isn't going to be enough.

So my position right now would be: don't worry about it, but

  1. don't mix f-strings and lazy evaluation, and
  2. consider lazy evaluation (or other conditionals) for debug logging, to avoid unseen log statements being too costly.

minrk avatar Feb 20 '23 14:02 minrk

So my position right now would be: don't worry about it, but

  1. don't mix f-strings and lazy evaluation, and
  2. consider lazy evaluation (or other conditionals) for debug logging, to avoid unseen log statements being too costly.

This sounds fine to me, thank you for clarifying the security aspect @minrk!

@manics what do you think?

consideRatio avatar Feb 20 '23 14:02 consideRatio

The one thing that is not considered here is showing the right way to do things. And as JupyterHub and extensions are a relatively well known projects, I think we ought to be an example as otherwise people will copy our practices.

Carreau avatar Feb 20 '23 14:02 Carreau

Fair point! I don't happen to agree that deferred formatting is "the right way to do things" unconditionally. If other folks disagree, we can add the lint rule to prohibit f-string logs.

I think another important consideration is application vs libraries. Library loggers (like sqlalchemy) don't control the log output configuration, that's the responsibility of applications, and are far more likely to have no visible logs. In those cases, the performance trade-off is likely to go the other way, and maximum compatibility is far more important because the log statements should work well no matter what application has set up the logs. If I were writing a library, I would make a different choice.

JupyterHub or idle-culler, on the other hand, is an application that is itself responsible for setting up the log output. So we can know things like - info-level logs are almost always displayed, and log capture/aggregation is best and ~always done at the unstructured pipe level, not the Python logging level. I think in our case, the improved clarity and simplicity of f-strings wins, all things considered. But only by a tiny margin, and I wouldn't object to folks writing individual log statements the other way.

minrk avatar Feb 20 '23 15:02 minrk

@dolfinus who has done great work in KubeSpawner wrote in https://github.com/jupyterhub/kubespawner/pull/678#discussion_r1155842583:

Using f-strings in logs is not recommended. Intergrations like Sentry or Elastic APM are using template strings to count number of errors caused by a specific issue, but f-strings produces different strings for each substitution values combination

I guess we are not integrating with these services ourselves, but maybe we will do something similar at some point.

consideRatio avatar Apr 03 '23 11:04 consideRatio

Administrators on standalone installations may activate this kind of integration, I'll be better not to mess up with it

dolfinus avatar Apr 03 '23 12:04 dolfinus