python-logstash-async
python-logstash-async copied to clipboard
Performance improvements
In my hunt for a bottleneck in one of our applications, I've found another few hotspots. Not that it was slow before, but when you push a few thousand messages per minute, some things just pop up.
This PR contains two small but measurable improvements. My approach was a simple timeit script, run standalone and with a profiler:
result = timeit.timeit(stmt="""formatter.format(lr)""",
setup="""
from logstash_async.formatter import LogstashFormatter
from logging import LogRecord, Logger
import datetime
import logging
now = datetime.datetime.now()
formatter = LogstashFormatter()
lr = Logger('dummy-logger').makeRecord(
'Some name', logging.ERROR, 'blubb.py', 123, 'Some message',
args=(), exc_info=None,
extra={'foo': 'bar', 'baz': 123, 'somedatetime': now})
""",
number=100000)
print(result)
With the current 2.3.0, this script completed (on my machine) in around 2.6 seconds.
The first change was to change FORMATTER_RECORD_FIELD_SKIP_LIST to a set: this list is used in the fashion of x in y which is way faster for sets than list. This brought the execution time down to 2.2 seconds.
The other change was to exclude fields from FORMATTER_RECORD_FIELD_SKIP_LIST in _get_record_fields: The formatter pulled all fields from the message and the extra into the dict and removed the fields afterwards in _remove_excluded_fields. Since we're iterating over __dict__.items() in _get_record_fields anyway, we can skip the fields right there and save us the del later on.
With both changes, I'm down to 1.7 seconds.
I'm working on another improvement involving the datetime handling, but that will come in another PR as it involves a little bit more code.