SNOW-762783: Inconsistent logging message formatting in codebase - switch to % formatting
What is the current behavior?
The snowflake-connector-python codebase currently uses a mix of f-strings and % formatted strings for logging messages. This inconsistency can lead to potential performance and compatibility issues for users.
What is the desired behavior?
The desired behavior is to consistently use % formatted strings for logging messages throughout the snowflake-connector-python codebase.
If the maintainers are open to it, I would be more than willing to submit a pull request for this issue. Also completely fine with a "No" here, I totally understand. 🙂
How would this improve snowflake-connector-python?
Switching to % formatted strings for logging messages would bring the following improvements to the snowflake-connector-python library:
- Performance improvement: By using % formatting, log messages will only be evaluated if the logging level is appropriate for the message to be emitted. This could result in performance improvements, especially when logging complex or time-consuming expressions.
- Compatibility with custom loggers and handlers: Users who have their own custom loggers and handlers for processing log messages will benefit from the switch to % formatting. This will ensure that the library's log messages are easily compatible with custom logging solutions and provide users more control over when and how the string interpolation occurs.
- Better error handling: Using % formatting allows for better error handling, as any errors in formatting or expressions will only surface if the message is emitted. This enables users to catch and handle errors within custom loggers or handlers if necessary, rather than having the application crash immediately due to an issue with an f-string or
.format. - Consistency within the codebase: Having a consistent approach to formatting log messages within the codebase can make the code easier to read and maintain. This consistency also aligns with the Python logging library's internal use of %-formatting.
References, Other Background
thanks for reaching out and for the comprehensive analysis. this is an interesting topic.
I definitely agree upon that consistency is an issue that we should address. However, I'd like to know a bit more about the % formatter:
- for the perf improvement, I believe what you described is correct, however, I'd like to test out the real perf diff.
- for
Compatibility with custom loggers and handlersandBetter error handling, do you mind giving me some concrete example to illustrate how % format could help?
I will also bring this topic to my team, thanks for bringing it up!
Thank you for considering this proposal and for your response.
- In regards to performance improvement: I have created a script to benchmark the performance of f-strings and %-strings for logging messages. The script compares the time taken to log messages using f-strings and %-strings for both
Logger.errorandLogger.debugmethods:
import logging
import time
logger = logging.getLogger(__name__)
def benchmark_fstring(func, n):
name, age = "John Doe", 42
start = time.perf_counter()
for _ in range(n):
func(f"Hey {name}, your age is {age}!")
end = time.perf_counter()
duration = end - start
return duration
def benchmark_percentstring(func, n):
name, age = "John Doe", 42
start = time.perf_counter()
for _ in range(n):
func("Hey %s, your age is %s!", name, age)
end = time.perf_counter()
duration = end - start
return duration
def benchmark(n):
for func in (logger.error, logger.debug):
print(f"Running benchmark for n={n} and func={func.__name__}")
duration_fstring = benchmark_fstring(func, n)
duration_percentstring = benchmark_percentstring(func, n)
print(f"\tWith f-strings: {duration_fstring} seconds")
print(f"\tWith %-strings: {duration_percentstring} seconds")
if __name__ == "__main__":
benchmark(1_000_000)
When running the script with Python 3.7.16, the results are as follows:
$ python -u t.py 2> /dev/null
Running benchmark for n=1000000 and func=error
With f-strings: 21.419322967998596 seconds
With %-strings: 23.334060895998846 seconds
Running benchmark for n=1000000 and func=debug
With f-strings: 0.493426320997969 seconds
With %-strings: 0.34116271500170114 seconds
The results indicate that f-strings are approximately 10% faster in the error case but around 30% slower in the debug case.
Additionally, I have profiled the unit tests using the following command:
$ python -m cProfile -o output.pstats -m pytest test/unit
Upon analyzing the profiler output using tools like gprof2dot, logging accounts for approximately 0.75% of the total test suite time. Assuming a potential performance gain somewhere in between, say 20%, from using %-strings, the overall performance improvement would be around 0.15%, which might be considered negligible (after all, snowflake-connector-python doesn't spent most of its runtime logging).
- In regards to compatibility with custom loggers and handlers: Users may have specific logging requirements related to their organization policies or infrastructure. The most common example is redacting sensitive information, for which we can make a custom handler or formatter (a little bit in the same spirit as
snowflake.connector.secret_detector.SecretDetector).
class SnowflakeRedactingLogHandler(logging.Handler):
def emit(self, record: logging.LogRecord):
new_args = tuple(
(arg if not _should_be_redacted(arg) else "[REDACTED]")
for arg in record.args
)
new_record = logging.LogRecord(
name=record.name,
level=record.levelno,
pathname=record.pathname,
lineno=record.lineno,
msg=record.msg,
args=new_args,
)
super().emit(new_record)
If the logs are f-strings, then you won't have direct access to args and the only way to get around it is by using regular expressions😢on the already formatted message, and it's not necessarily that easy and can be prone to false positives.
- In regards to better error handling: Consider the following example
try:
some_var = None # suppose this is coming from a function for which the return type is hard to predict
logger.debug("Variable value: %d", some_var)
except Exception as e:
logger.error("Error in formatting debug log: %s", e)
In this example, if some_var is None, the %-formatting will not cause an immediate crash, and the error can be caught and logged instead. This would not be possible with f-strings or .format, as the error would happen before the logging call.
Hi! I'd like to work on this issue.
I have experience with Python logging best practices and would be happy to contribute by converting the f-string logging statements to % formatting throughout the codebase.
Before I start, I wanted to confirm:
- Is this issue still open for contribution?
- Should I create a PR that addresses all logging statements in the codebase, or would you prefer this to be done incrementally by module/directory?
- Are there any specific modules or files I should prioritize or avoid?
Looking forward to contributing to snowflake-connector-python!