log icon indicating copy to clipboard operation
log copied to clipboard

Random crashes when using Console Writer and multiple goroutines

Open tekert opened this issue 4 months ago • 8 comments

They are very very random but they all pop at the same time, always the same error, same function involved, no code change between errors. Using version 1.0.119

This is the crash: happenes just after printing a line that has emoticon. log.Info().Str("address", config.Server.ListenAddress).Msg("🌐 Starting HTTP server") Then.. CRASH

unexpected fault address 0xffffffffffffffff
fatal error: fault
[signal 0xc0000005 code=0x0 addr=0xffffffffffffffff pc=0x81e3e5]

goroutine 1 gp=0xc0000021c0 m=0 mp=0x1054320 [running]:
runtime.throw({0xc066a5?, 0xc00016fb00?})
        C:/Program Files/Go/src/runtime/panic.go:1101 +0x4d fp=0xc00016fac8 sp=0xc00016fa98 pc=0x715e4d
runtime.sigpanic()
        C:/Program Files/Go/src/runtime/signal_windows.go:414 +0xd0 fp=0xc00016fb10 sp=0xc00016fac8 pc=0x6f7c70
github.com/phuslu/log.parseFormatterArgs({0xc000302100, 0x70, 0x100}, 0xc00016fd30?)
        E:/Sources/go/sdk/Tekert/go/pkg/mod/github.com/phuslu/[email protected]/formatter.go:115 +0x485 fp=0xc00016fbd8 sp=0xc00016fb10 pc=0x81e3e5
github.com/phuslu/log.(*ConsoleWriter).write(0xc0001ba5e0, {0xcce100, 0xc000070068}, {0xc000368000, 0x70, 0x400})
        E:/Sources/go/sdk/Tekert/go/pkg/mod/github.com/phuslu/[email protected]/console.go:62 +0x19c fp=0xc00016fc80 sp=0xc00016fbd8 pc=0x81931c
github.com/phuslu/log.(*ConsoleWriter).WriteEntry(0xc0001c6348?, 0x27?)
        E:/Sources/go/sdk/Tekert/go/pkg/mod/github.com/phuslu/[email protected]/console_windows.go:42 +0x7e fp=0xc00016fcc0 sp=0xc00016fc80 pc=0x81b1de

IA analisys: Analysis Consistent Crash Point: All three stack traces you've provided are identical in their cause. They all crash inside github.com/phuslu/log.parseFormatterArgs with a memory access violation (addr=0xffffffffffffffff). This function is called by log.(*ConsoleWriter).write.

The Root Cause is a Race Condition: This is a classic race condition related to resource pooling. The phuslu/log library uses a sync.Pool to reuse log.Entry objects for performance. Here's the sequence of events that leads to the crash:

Your application starts up, and multiple goroutines (main, http server, etw session) begin logging concurrently. Goroutine A calls log.Info()...Msg("..."). It gets a log.Entry object from the pool and writes the log data into its internal buffer (e.buf). The Msg() function calls the writer's WriteEntry method. In your case, this is main.(*safeWriter).WriteEntry. Crucially, as soon as the WriteEntry call is made, the log.Entry object is put back into the sync.Pool for reuse (logger.go:1849). Goroutine B, running concurrently, makes another log call. It gets the exact same log.Entry object that Goroutine A just returned to the pool. Goroutine B immediately resets the buffer (e.buf = e.buf[:0]) to write its own log message. Meanwhile, the ConsoleWriter from Goroutine A's call is still busy parsing the buffer in parseFormatterArgs. It suddenly finds that the memory it's reading has been wiped or overwritten, leading to the memory access violation and the crash.

tekert avatar Aug 16 '25 14:08 tekert

Just a question: does the crash happens with non-emoji characters? If not, the root cause is that "parseFormatterArgs" function cannot handle emoji properly .

phuslu avatar Aug 16 '25 15:08 phuslu

It seems that it always want to write an emoticon when it happens, because the only code change i made was to delete a prior line that had emoticon (just the line above the crash) and i think the crash now moved 1 line down, to the next emoticon. I'm not sure but seems plausible.

Edit: i looked more closely, sometimes it prints the emoticon and continues just fine and crashes on the 2nd one.

tekert avatar Aug 16 '25 15:08 tekert

Thanks, let me try to fix it in this direction

phuslu avatar Aug 16 '25 15:08 phuslu

could please give a try with go get -v github.com/phuslu/log@99590f16ff47bc97d8e0d7ad95eb8f36d8d43312

I cannot reproduce this issue but I think it is related with windows console locking.

phuslu avatar Aug 17 '25 07:08 phuslu

I can confirm that these crashes happened with my safe writter that the only thing it did was lock a mutex before writing, the same as the commit you mention. It's so random, noticed it happens more when i change the size of the binary, also, i have C threads running trough and printing, in parallel, so goroutines may not catch this. But emoticons and threads seems to be related, of the 10 crashes i had in a month they all where in parseFormatterArgs.

EDIT: did a test forcing 1000 goroutines printing with emoticons etc, no crash the week before reporting this.. It's so evasive, even with the safeWriter wrapper it crashed after importing a new library. EDIT2: i forgot to mention, when this happens, its reproducible, you can spam it, but change a single byte in the code, do "go run ." and the crash is gone, until it pops again randomly after a few days

tekert avatar Aug 17 '25 10:08 tekert

Experiencing the same issue today with the patch.

unexpected fault address 0xffffffffffffffff
fatal error: fault
[signal 0xc0000005 code=0x0 addr=0xffffffffffffffff pc=0x7ff632dc5580]

goroutine 42 gp=0xc000232c40 m=4 mp=0xc00007b808 [running, locked to thread]:
runtime.throw({0x7ff6331a6c51?, 0xc000037c88?})
        C:/Program Files/Go/src/runtime/panic.go:1094 +0x4d fp=0xc000037c50 sp=0xc000037c20 pc=0x7ff632cb9f8d
runtime.sigpanic()
        C:/Program Files/Go/src/runtime/signal_windows.go:406 +0xd0 fp=0xc000037c98 sp=0xc000037c50 pc=0x7ff632c9b2d0
github.com/phuslu/log.parseFormatterArgs({0xc000244400, 0x9f, 0x200}, 0xc000000000?)
        E:/Sources/go/sdk/Tekert/go/pkg/mod/github.com/phuslu/[email protected]/formatter.go:115 +0x480 fp=0xc000037d60 sp=0xc000037c98 pc=0x7ff632dc5580
github.com/phuslu/log.(*ConsoleWriter).write(0xc00020a500, {0x7ff63326abe0, 0xc000078038}, {0xc000260000, 0x9f, 0x400})
        E:/Sources/go/sdk/Tekert/go/pkg/mod/github.com/phuslu/[email protected]/console.go:62 +0x19c fp=0xc000037e08 sp=0xc000037d60 pc=0x7ff632dc043c
github.com/phuslu/log.(*ConsoleWriter).WriteEntry(0xc00020a500, 0xc000204cf0)
        E:/Sources/go/sdk/Tekert/go/pkg/mod/github.com/phuslu/[email protected]/console_windows.go:44 +0x105 fp=0xc000037e80 sp=0xc000037e08 pc=0x7ff632dc2405
github.com/phuslu/log.(*Entry).Msg(0xc000204cf0, {0x7ff6331b76a0, 0x15})
        E:/Sources/go/sdk/Tekert/go/pkg/mod/github.com/phuslu/[email protected]/logger.go:1868 +0x16a fp=0xc000037ee8 sp=0xc000037e80 pc=0x7ff632dceb4a

tekert avatar Aug 22 '25 22:08 tekert

@phuslu I finally tracked the problem.

After a lengthy investigation, I've identified a race condition in parseFormatterArgs and have developed a minimal, effective patch.

The Problem

Under high load, with many goroutines writing JSON logs simultaneously, the application crashes with a low-level memory fault. This is not a standard Go panic and cannot be recovered.

Signal: [signal 0xc0000005 code=0x0 addr=0xffffffffffffffff]

The crash consistently occurs inside parseFormatterArgs when appending a new key-value pair.

Root Cause Analysis (this is where I asked the IA write my debug notes)

The bug is a race condition triggered when append is called on the args.KeyValues slice.

  1. In parseFormatterArgs, the args struct is reused, and args.KeyValues starts as a nil slice.
  2. When multiple goroutines enter parseFormatterArgs concurrently, they all see args.KeyValues as nil.
  3. The first call to append(args.KeyValues, ...) on a nil slice triggers an allocation for a new backing array.
  4. If two or more goroutines attempt this allocation simultaneously, they can race to update the slice header (the pointer, length, and capacity) on the shared args struct.
  5. This race corrupts the slice header, causing a subsequent memory access to read from an invalid address like 0xffffffffffffffff, leading to the fatal fault.

This explains why the bug is so elusive: it requires a specific timing of goroutines to trigger the race on the initial allocation, which only happens under heavy, parallel load.

The Fix

The solution is to prevent the race condition by ensuring append never operates on a nil slice. By pre-allocating the slice with a default capacity at the beginning of the function, we guarantee that it has a valid, non-nil header. Subsequent append calls will then safely operate on this existing slice.

This is the minimal, targeted patch that resolves the crash:

--- a/formatter.go
+++ b/formatter.go
@@ -60,6 +60,15 @@
 
 // parseFormatterArgs extracts json string to json items
 func parseFormatterArgs(json []byte, args *FormatterArgs) {
+	// Pre-allocate KeyValues slice to a reasonable capacity.
+	// This prevents a race condition that can lead to memory corruption
+	// when append is called on a nil slice from multiple goroutines.
+	args.KeyValues = make([]struct {
+		Key       string
+		Value     string
+		ValueType byte
+	}, 0, 16)
+
 	// treat formatter args as []string
 	const size = int(unsafe.Sizeof(FormatterArgs{}) / unsafe.Sizeof(""))
 	//nolint:all

The FormatterArgs struct is still allocated on the stack. Our patch simply changes when and how the backing array for the KeyValues slice is allocated. We now do it explicitly and safely at the beginning of parseFormatterArgs with make.

tekert avatar Sep 03 '25 22:09 tekert

OoooooH, brilliant job and thanks !

phuslu avatar Sep 04 '25 03:09 phuslu