gelf-extensions-logging
gelf-extensions-logging copied to clipboard
Add TCP Support
A single TCP stream will be used, using a configurable default timeout of 1000ms.
By default, all TCP errors will be ignored. By enabling ThrowTcpExceptions in the options, the TCP exceptions will propagate to the application, which could be useful for debugging purposes.
Solves issue #64
Hrm... maybe wait with merging. The way we're using TcpSockets here turns out to be unstable in IIS server environments. We'll need to move back to a setup that opens sockets per batch. Will adjust that shortly.
No problem, let me know when you're happy with the changes.
We've learned that Windows Server or IIS (or maybe our infrastructure) sometimes likes to close TCP sockets without prior warning after a few seconds. It doesn't seem to happen in Linux and Kubernetes environments. It looks like keepalive is not working correctly. When it happens, the socket is still presented as being connected, but in reality, it will fail with an IOException as soon as you try sending data.
Latest commit adds a single retry to account for these cases. If it fails twice in a row (very unlikely), the client will carry on.
Some minor feedback before we merge this.
I just had a look a quick look at how Serilog does this. They check TcpClient.Connected rather than catching IOException which is a little cleaner IMO.
I'm surprised you need a lock in TcpGelfClient - messages are added to an internal queue and sent 1 at a time in the background, even if the application is logging from multiple threads
GelfLoggerOptions.ThrowTcpExceptions is redundant because exceptions thrown while sending logs occur on a background thread and don't bubble up to the calling application (see here). Changing this setting will have no impact.
Totally agree on the checking of Connected, that's also the first check we do. (src/Gelf.Extensions.Logging/TcpGelfClient.cs Line 61)
But in all cases where the infra or OS layer closed the connection, the state was still Connected, while the first next write failed immediately with an IOException. After that exception is thrown, the Connected flag goes to false. With this check in place, if Connected is indeed false, we won't wait for the exception. It will only trigger if Connected=true, while the socket still refuses. If we don't handle that IOException, we would miss one message every time it happened.
It's really baffling me. We haven't been able to reproduce the behavior in k8s or locally in docker, but it consistently occurs in IIS in production.
I was not 100% certain about the threading of the background process, because there are asynchronous operations too. If it's guaranteed to be single-threaded, the locking is indeed pointless - as is the asynchronicity.
If you wouldn't mind removing GelfLoggerOptions.ThrowTcpExceptions, I can get this merged for you.
Hi, I would be really interested in this feature getting merged, because I have a great opportunity to use it at work. Is there any possibility that you, @joepb, could remove GelfLoggerOptions.ThrowTcpExceptions as requested?
And if there is any more help needed, I'd be happy to help out.