go
go copied to clipboard
runtime: Stack() calls lock against each other unnecessarily, even when single-goroutine
What version of Go are you using (go version)?
$ go version 1.19.2
Does this issue reproduce with the latest release?
y
What operating system and processor architecture are you using (go env)?
go env Output
$ go env (encountered on Linux but I assume it shows up everywhere)
What did you do?
I saw a profile that showed 85% of CPU time in calls to runtime.Stack(), and 81% of total CPU time in calls to printlock and printunlock.
What did you expect to see?
I expected calls that were independent of each other, writing into buffers rather than to shared streams, on different CPUs, to proceed without lock contention.
What did you see instead?
Every single write in any part of the traceback process goes through print (and friends) ultimately, which means that it uses printlock and printunlock. This is true even when g.writebuf is non-nil and the different gs will be writing into different buffers. As a result, the stack tracebacks block each other, and since they're both making large numbers of individual calls to print, that results in heavy lock contention. (To complicate things, there's cases where a non-nil g.writebuf still doesn't cause the writes to go into the buffer, as when gp.m is dying.)
This looks like a mess to fix. It's "easy" to fix by changing the code to thread a write target through it, except that you absolutely don't want any of this code dealing with stuff like io.Writer or other higher-level abstractions. I also don't know whether it's actually completely safe to have multiple copies of print running, against different write buffers, on different gs. In the case where you really are writing to a stream, you probably don't want a buffer. You don't want to have two copies of all of this code, identical except for their output. But having two calls to runtime.Stack() generate hundreds of contended locks seems pretty bad too...
cc @golang/runtime
In triage this doesn't seem like high priority to look at. Stack could probably avoid the printlock but that might be tricky today. This might be more doable if we clean up gentraceback in general (CC @aclements).
(In other words, effectively blocking this on #54466.)
Note that using runtime.Callers should avoid this lock contention.
There was also some discussion in the Gophers Slack about one possible hack fix:
func printlock() {
- mp := getg().m
+ gp := getg()
+ if gp.writebuf != nil {
+ return
+ }
+ mp := gp.m
(and the same for printunlock)
That's not the only reason printlock exists. It also ensures that if two goroutines call print in parallel with multiple arguments then the output will not be interleaved.
Right. But if I read the code correctly (big if), when gp.writebuf is non-nil, we are always non-concurrent.
Perhaps you are not aware that the compiler generates calls to printlock. When the compiler sees print(a, b, c), it generates code like
printlock()
printint(a)
printint(b)
printint(c)
printunlock()
I am indeed quite aware of that. :)
In those cases, however, gp.writebuf is nil (the output goes straight to stderr). gp.writebuf is only non-nil for runtime.Stack and a runtime tests.
Sure, but my point was that if two different goroutines call print in parallel with multiple arguments, their output will not be interleaved. And that still seems true to me. Am I missing something?
(It's true that the output from print can be interleaved with other output that the program might be writing to standard error. That seems harder to avoid, though as you imply it could be done by setting writebuf.)
Am I missing something?
I hope so? :P
The idea of the patch above was that we effectively disable printlock and printunlock only on the relevant G when its writebuf is non-nil. writebuf is non-nil only when runtime.Stack is actively using its (systemstack's) G to print a stack. All other goroutines that are printing will have nil writebufs, and the locking will proceed as usual.
Argh, sorry, I get it now. Completely misunderstood. Apologies.
I did note that runtime.Callers produces part of this data without locking, but there's no way to get the argument info, if you want it, and also there's no easy way to just get the pretty-printed output. So, it solves the performance problem but it loses information; so far as I can tell, there's nothing available to a user that can produce the call stack with args that isn't this.
So far as I can tell, writebuf can only be non-nil during a call to runtime.Stack(), and during one specific test that also assigns to it for some reason. There's no other uses in the code.
So I think in practice the "just dummy out locks when writebuf is non-nil" behavior would be correct, but I'd feel a bit nervous about it being brittle. I'm also not 100% sure I know what it would do in the case where you request tracebacks for all goroutines, but I assume it works. (I also note that if you request tracebacks for all goroutines, the world is stopped, so, good news, no lock contention, bad news, you already took every performance hit.)
That said, I'm certainly not about to volunteer to try to make the traceback stuff work differently, that looks like a really hard problem.