service
service copied to clipboard
Is it possible to log program crashes to the system log?
I have a program that uses lots of goroutines, and I suspect that a rare panic is possible. This program runs as a service on Windows, and as far as I can tell that means you cannot access stderr to view the stacktrace when this happens (I'm not a Windows expert).
I've been investigating how to try and capture one of these crashes. Obviously the brute force approach would be to put recovers everywhere, but this program is very complex and the crash is rare. Is there a better way to track down the issue?
I've got two ideas right now, neither of which is ideal. The first is to do something equivalent to dup2 to redirect stderr to a file, and bypass the Windows service logs entirely. The second is to make the service program a dumb wrapper that spawns the real program as a child process, then reads its stderr and logs it.
For logs within the service you can of course log to a file.
But afaik if the service itself crashes, Windows logs the message/error code to system events (Open Event Viewer). You can perhaps refactor your code to exit with specific error codes based on where the panic is happening and cross-check in system events.
But afaik if the service itself crashes, Windows logs the message/error code to system events
Based on my experience yesterday, I encountered a hard-to-debug problem where a panic occurred but wasn't logged to the event logs.
Windows of course detected the crash, and logged the fact that the service crashed, but the panic itself (and associated stack trace) was never logged. All I got was this log from Windows:
The <foo> service terminated unexpectedly. It has done this 1 time(s). The following corrective action will be taken in 30000 milliseconds: Restart the service.
When I ran the service interactively from a terminal, the actual error and panic was printed to stderr, but it appears Windows does not capture stdout/stderr from a service process. (Which makes sense; event logs are the defined way to log things for services, not stdout/stderr.)
I'm not sure whether the correct solution is to perform a recover()
and try to print the panic to the nearest windowsService.Logger()
, or whether the solution is to override os.Stdout
and os.Stderr
, pointing it to the event logger.
Either way, the current state of allowing panic logs to go silently unreported is a bad state of affairs.
Hmmm, doing some further reading I found https://github.com/golang/go/issues/42888 where it looks like the issue is more complex.
Basically, panic()
does not write to os.Stderr
(it writes directly to fd 3), and recover()
only works for the current goroutine. So it's really impractical to solve this inside this module.
In the upstream issue, they are looking at solving this inside the Go runtime (and/or x/sys/windows
package) by writing to the event logs in event of panic (possibly needing to call a SetCrashEvent()
function) which does sound like a much better thing to wait for since it will catch very low-level problems.