trafficserver
trafficserver copied to clipboard
Fast log buffer
This adds a configurable fast mode to log objects that use per-thread log buffers rather than having all threads contend for per-object buffers. While this is much faster, it does not support ordered log entries. This makes it situationally useful, but if a user has binary logs or doesn't need ordered log output, this could allow them to log many more transactions. Its configurable either globally via records.config or per-log via logging.yaml.
What performance difference does your benchmark show? How about a cut-n-paste of the output?
benchmark name samples iterations estimated
mean low mean high mean
std dev low std dev high std dev
-------------------------------------------------------------------------------
logobject fast 100 1 50.7197 s
497.339 ms 482.904 ms 502.998 ms
43.2047 ms 20.2149 ms 91.1906 ms
logobject slow 100 1 11.9278 m
5.55437 s 5.43973 s 5.68358 s
619.759 ms 508.693 ms 748.277 ms
Here is the benchmark output. The fast log buffers are about 10x faster. The performance improvement greatly depends on the performance of the CAS instruction, which on intel is abysmal under high contention. For AMD the perf difference is far less dramatic. As an anecdotal difference, we were able to run full logging (without sampling) without a noticeable drop in requests per second and fairly instantly filled a 10gig RAM disk with "fast" log files. Before this change, enabling logging would dramatically reduce our requests/second.
Also, I have to modify other parts of the code to run the benchmark since the JemallocNodumpAllocator creates (and leaks) thread local arenas, it cannot support as many thread starts as the benchmark spawns, but using the ink_freelist_* functions just adds another CAS bottleneck which ruins everything. I've included a commit that replaces all of that with a straight malloc Allocator class in case you want to try to reproduce this, but I was going to add that as an optional allocator in a separate PR. I'd like to remove that from this PR before any potential merging of this code (assuming it's acceptable).
Has this been tested in production at all?
This has only been tested in lab environments
Looks good except for nitpicks.
[approve ci]