Prototype icon indicating copy to clipboard operation
Prototype copied to clipboard

Figure out why Serilog logging is slow

Open QuintillusCFC opened this issue 3 years ago • 13 comments

As part of investigating #273, I determined that the logging I was adding to get a better idea of what was taking so long was actually slowing things down, in some cases significantly more than what I was investigating in the first place. This investigation has now gone deep enough that it warrants being split off into its own issue.

This is good to know sooner rather than later, as it seems to be much slower than Log4J in the Java world, and if we are going to switch to e.g. Log4Net or another .NET solution, it's easier to do so early.

QuintillusCFC avatar Jun 16 '22 15:06 QuintillusCFC

Copying from the other thread:

Plot twist - logging is slow. Not sure why/what part of it is slow (beyond the Filter.ByIncludingOnly part of LogManager not making a difference). But even if I just log a hard-coded string, in my seeded-random-number example where it looks at 6 paths, it takes the average from 2 to 11 milliseconds, or +1.5ms per path. This means that the log statements are responsible for over 80% of the run time. It's a sort of Heisenbug situation where applying debug methods to evaluate the state changes the characteristics of what you're trying to debug.

Hmm, if I leave off the Write To Godot Sink, but leave on the log of a hard-coded string, the overhead from six logs shrinks from 9 ms to 1 ms. That is a major reduction. Gotcha: Leaving out that line also takes our the formatter. So it could be the sink that's slow, it could be my console that's slow, or it could be the formatter that's slow. But the log.Warning("hardcoded string") itself is not especially slow.

QuintillusCFC avatar Jun 16 '22 15:06 QuintillusCFC

Did some more testing. The random string hard-coded as 123, measuring the Dutch catapult (the last unit on turn 1) figuring out where to explore, where it checks 10 possibilities and logs how long each of those 10 takes. The times across 5 runs are:

no logs (just the pathing):            1, 1, 1, 1, 1      = 1    ms average
hardcoded, but no Godot Sink:          1, 2, 2, 2, 2      = 1.8  ms average
hardcoded, Godot, without formatting:  5, 6, 7, 6, 6      = 6    ms average
hardcoded, Godot, with formatting:     16, 16, 14, 16, 16 = 15.6 ms average
real log, Godot, with formatting:      23, 25, 34, 23, 31 = 27.2 ms average

"log statement" means a log statement for each tile possibility; "no logs" just logs the cumulative time for all 10 tiles' pathing at the end.

This is not encouraging as it seems that all the components of the logging framework are contributing. We get an extra 0.8 ms just from having the log statement, an extra 4.2 ms from logging to the Godot sink (which is the part that could be system-dependent), a whopping 9.6 ms from our formatter, and an extra 11.6 ms from passing in actual variables to the formatter.

Those times are across all 10 tiles, so the last one averages 2.7 ms per tile, of which 2.6 ms is logging overhead.

These are all based on variations of commit 9f7b9ef095f70ef20c0fa778662cf5a8d744385e

QuintillusCFC avatar Jun 16 '22 16:06 QuintillusCFC

One more config, and re-mentioningwhat it compares to from above:

hardcoded, Godot, but too low to long: 1, 1, 1, 1, 1      = 1    ms average
hardcoded, Godot, with formatting:     16, 16, 14, 16, 16 = 15.6 ms average

This is the first piece of good news. Filtering out log statements is fast, adding virtually no overhead for things that don't get logged. This should be fast, and is. This is with using a .Verbose with Debug as the min (and the Filter.ByIncludingOnly still enabled; it should be irrelevant but comes first in the config and thus might get processed).

Overhead, non-cumulative, for 10 logs combined, 5-run average:

0.8 ms: Not getting filtered out
4.2 ms: Getting sent to the Godot sink (may be system-dependent to some extent)
9.6 ms: The formatter getting invoked
11.6ms: The formatter having to deal with variables

When is this a problem?

1. If we have too many logs enabled in general.  If we tell someone to run with debug logging, and we have lots of debug logs, it could slow things down a lot.
2. Adding extra logs at a granular level to investigate the performance of tight loops, as I've done here.

Going to read some blogs about Serilog and slowness as a next step.

QuintillusCFC avatar Jun 16 '22 16:06 QuintillusCFC

Found some good info.

https://github.com/serilog/serilog-sinks-console/issues/58, especially the first comment from skomis-mm, is really helpful. Basically, the configuration we have may be invoking system console commands a lot, and thus slowing things down. Unfortunately, I can't swap to AnsiConsoleTheme, because:

Argument type 'Serilog.Sinks.SystemConsole.Themes.ConsoleTheme' is not assignable to parameter type 'Serilog.Templates.Themes.TemplateTheme?'

(I'm still a bit baffled by why half the themes aren't compatible with the other half of the themes)

That page also mentions that asyncing the logging should help at least somewhat, though perhaps not completely solving the problem.

It's also worth noting that skomis-mm mentions that Serilog is a structured-first logging framework, and Log4Net may not have the same console performance problems. While we do plan to enable file logging at some point, I have to wonder whether we've picked the wrong tool for the job. Serilog seems well-suited to enterprise work where you'll have Splunk/Elasticsearch/etc. to use that structured logging, but we're a local desktop application that's going to have either a console log for developers, or a flat file that users can upload when reporting bugs. I.e. all the fancy things Serilog does to make it great for enterprise might make it not great for a desktop application.

There's also someone who created a Fast Console sink at https://github.com/manigandham/serilog-sinks-fastconsole . It doesn't support themes and colors, but does async behind the scenes.

QuintillusCFC avatar Jun 16 '22 16:06 QuintillusCFC

Added a file sink instead, and got better results. With the same data:

real log, custom output format, file:  2, 2, 2, 2, 2      = 2.0  ms average

This is thrice as good as any test with the Godot sink, even the one with no formatting, and very near the result for having no sinks at all.

Another case:

Random exploration for Warrior on Babylon's turn 4, looks at 23 tiles.

With only file, took 10ms total (including logs). With Godot sink, took 59 ms.

With no sinks, also 10 ms. With no Serilog logs, took 8 ms.

File sink config:

			.WriteTo.File("log.txt", buffered: true, flushToDiskInterval: TimeSpan.FromMilliseconds(1000),
			              outputTemplate: "[{Level:u3}] {Timestamp:HH:mm:ss} {SourceContext}: {Message:lj} {NewLine}{Exception}")

Conclusion: File sinks are a lot quicker than console sinks.


I also briefly looked at NLog and Log4Net. Log4Net requires some additional config to work with .NET Core, but it appears to be possible. NLog is theoretically slower than Serilog, with the major caveat that the blog that evaluated it wasn't factoring in Serilog enhancers, which may slow Serilog down. They were also looking at file output, not console.

QuintillusCFC avatar Jun 17 '22 09:06 QuintillusCFC

Interesting. Thanks for chasing this down. My first thought was that with Serilog being a scalable enterprise tool I'm sure its performance is up to the task, but maybe logging directly to text just isn't what it's optimized for. Tagging @pcen, since he introduced Serilog, and @TinaFemea for insight.

When you say Godot with/without formatting, did you just exclude the formatter parameter? Within GodotSink the obvious culprit would seem to be the StringWriter created in each call to transform the formatter into a string. That seems wasteful but again I was just following an example. But that's only half the answer, if enabling the GodotSink without it still tripled the time.

I'm very surprised that the file sink is so much faster than the console sink though, and can't speculate why that would be. It would be wild if tailing the log file turns out to be faster than logging directly to console. I got a bit lost in the stats by the end. How do console sink with no formatter, fast console sink, and file sink compare? It sounds like AnsiConsoleTheme is worth looking into, if we can configure something to use it.

Work has been kicking my butt so I probably won't be back in the code for the next few days at least.

WildWeazel avatar Jun 26 '22 23:06 WildWeazel

I came across this in one of my searches: https://benfoster.io/blog/serilog-best-practices/ Not an answer to anything, but some interesting hints. The sections on timing operations, message templating, and configuration may be helpful.

WildWeazel avatar Jun 27 '22 04:06 WildWeazel

Formatting in logging is consistently an issue across all loggers I've ever used. What we do to mitigate it is wrap logging statements in tag-level tests - so something like:

if (Logger.IsEnabled(Debug, TroopMovementTag)) Log.Debug(VeryLongOrFormattedStringHere)

In places where we can't afford to lose the performance.

We've also had good luck with adding a new Log.Debug method that takes a Func as a parameter instead of an actual string. Then Log.Debug only calls it when it knows it's going to write it out (and can do the tag-level check internally).

TinaFemea avatar Jun 27 '22 07:06 TinaFemea

Interesting. Thanks for chasing this down. My first thought was that with Serilog being a scalable enterprise tool I'm sure its performance is up to the task, but maybe logging directly to text just isn't what it's optimized for. Tagging @pcen, since he introduced Serilog, and @TinaFemea for insight.

When you say Godot with/without formatting, did you just exclude the formatter parameter? Within GodotSink the obvious culprit would seem to be the StringWriter created in each call to transform the formatter into a string. That seems wasteful but again I was just following an example. But that's only half the answer, if enabling the GodotSink without it still tripled the time.

I'm very surprised that the file sink is so much faster than the console sink though, and can't speculate why that would be. It would be wild if tailing the log file turns out to be faster than logging directly to console. I got a bit lost in the stats by the end. How do console sink with no formatter, fast console sink, and file sink compare? It sounds like AnsiConsoleTheme is worth looking into, if we can configure something to use it.

Work has been kicking my butt so I probably won't be back in the code for the next few days at least.

I believe that's correct, "without formatting" took out the formatter parameter, so it went to the simpler if (_formatter is null) block in GodotSink. But it's been a few weeks now, and I didn't create a commit specifically for that configuration in my "scratchpad" branch, so that's 90% rather than 100% reliability.

I hadn't noticed the StringWriter being re-created each time; you're right that that adds some inefficiency. Maybe that's part of the 3.4ms from the GodotSink versus no GodotSink; I wouldn't have guessed it would take that long to create a StringWriter but it's greater than zero.

As accurately as I remember it (really just piecing it together from what I wrote a few weeks ago); this is from the the Dutch catapult (the last unit on turn 1) figuring out where to explore, where it checks 10 possibilities and logs how long each of those 10 takes example

file log: 2 ms average GodotSink, without formatting/variables: 6 ms average GodotSink, with formatting: 27.2 ms average

I wasn't able to get the "fast console sink" working, IIRC it uses one style of formatters and our code uses the other (still not sure why there's two that are incompatible).

QuintillusCFC avatar Jul 05 '22 17:07 QuintillusCFC

I think I did see that Ben Foster blog while researching this, and read at least part of it.

Formatting in logging is consistently an issue across all loggers I've ever used. What we do to mitigate it is wrap logging statements in tag-level tests - so something like:

if (Logger.IsEnabled(Debug, TroopMovementTag)) Log.Debug(VeryLongOrFormattedStringHere)

In places where we can't afford to lose the performance.

We've also had good luck with adding a new Log.Debug method that takes a Func as a parameter instead of an actual string. Then Log.Debug only calls it when it knows it's going to write it out (and can do the tag-level check internally).

Yeah, that's what you have to do in Log4J 1.x to get good performance. I think they resolved that in Log4J 2.x, so it generally isn't necessary anymore. But I only really have a lot of experience in Log4J and Logback, not in any .NET specific loggers (beyond what I've learned here).

That is interesting about passing a Func instead of a string, makes sense that that would save the formatting till later, and only if it's enabled. I wonder if that's possible with Serilog? I've noticed Rider suggests using templates rather than formatting the string manually as a parameter, e.g. Log.Information("unit with attack/defense {0}/{1} entered combat", unit.attack, unit.defense), and I did try that, but did not see any difference in performance. But that isn't quite the same.

(Also before going back and hacking away at Serilog more, I'd like to do a few "real" C7 tasks. This wound up being a huge side quest)

QuintillusCFC avatar Jul 05 '22 17:07 QuintillusCFC

Moving this back to "To do" so it can be picked up for a "Part Two". Whether that really needs to be done for Carthage probably depends on how things go and if it continues to be a pain point or not.

QuintillusCFC avatar Jul 05 '22 17:07 QuintillusCFC

Reopening so it doesn't get overlooked

WildWeazel avatar Jun 11 '23 03:06 WildWeazel

Is this still an active limiting factor? If so, could we open a "part 2" to this, linking to the research here? I hadn't been noticing any issues relating to Serilog in the past year, so I thought it made sense to have this closed as having solved the immediate problem. #259 is the longer-term goal that should make further improvements, and if we want a "fine-tuning Serilog" card, that's fine, but I don't see any benefit to having this be re-opened unless there's evidence that it's a problem now.

(General project-management philosophy, I prefer smaller tasks that improve things as they go to ones that last weeks, months, or years. Search and links between issues can help us find earlier research on the same topic, as can wiki pages if it gets that complex. I think further improvements should also be in "Dutch" or later; never-ending tasks contribute to the feeling that the project is stuck in a rut and not moving forward)

QuintillusCFC avatar Jul 17 '23 19:07 QuintillusCFC

I was of the opinion that this was solved-for-now and if it ever became a problem in the future, we could start a new issue. My vote it to do nothing else on it for Carthage.

QuintillusCFC avatar Oct 03 '24 04:10 QuintillusCFC