fgprof
fgprof copied to clipboard
A small performance optimization
Hey Felix,
I was playing with runtime package/profilers a bit for learning and while trying out different things, I happen to see following code is used to retrieve all Goroutines:
for {
n, ok := runtime.GoroutineProfile(p.stacks)
if !ok {
p.stacks = make([]runtime.StackRecord, int(float64(n)*1.1))
} else {
return p.stacks[0:n]
}
}
However it turns out following is a bit faster as runtime.NumGoroutine() just returns len(gcount()) whereas GoroutineProfile seems to do bunch of other things.
p.stacks = make([]runtime.StackRecord, int(float64(runtime.NumGoroutine())*1.1))
for {
...
}
I have benchmarked following function:
func GoroutineProfile(useNumG bool) {
var stacks []runtime.StackRecord
if useNumG {
nGoroutines := runtime.NumGoroutine()
stacks = make([]runtime.StackRecord, int(float64(nGoroutines)*1.1))
}
for {
n, ok := runtime.GoroutineProfile(stacks)
if !ok {
// this can happen regardless of our approach
stacks = make([]runtime.StackRecord, int(float64(n)*1.1))
} else {
break
}
}
}
And the results are:
go test -bench=. -benchmem ~/Desktop/p/go-tryouts
goos: linux
goarch: amd64
BenchmarkGoroutineProfile/Using_runtime.NumGoroutine()-12 111543 9944 ns/op 768 B/op 1 allocs/op
BenchmarkGoroutineProfile/Not_using_runtime.NumGoroutine()-12 66451 16140 ns/op 768 B/op 1 allocs/op
While not too significant, it is not bad for a single line of change.
Would you be interested for a PR for this? Is there any other problem with the approach?
Hey @sumerc thank you so much for taking an interest in this and reaching out. I love geeking out about this stuff : ).
As far as your change is concerned, I'd love to see a PR for it. I can totally see how it's faster for the function you've shown, but the code inside of fgprof is slightly different. The profiler struct is written in a way that the costs of discovering the ideal size of the stacks slice is amortized by only doing it once in the beginning. So your optimization would only help with the very first profiler.GoroutineProfile() call.
Anyway, if it's a small change I'd still be happy to merge it. And I'm also happy to just discuss ideas on how to make things faster in general!
Anyway, if it's a small change I'd still be happy to merge it. And I'm also happy to just discuss ideas on how to make things faster in general!
Me too! Thanks for such inclusive/positive comment :)
The profiler struct is written in a way that the costs of discovering the ideal size of the stacks slice is amortized by only doing it once in the beginning
I understood. p.stacks is only initialized once. Well, I also confirmed you are right. The change did not change(too much) the output go test -bench=BenchmarkProfilerGoroutines but in anycase, I will submit the PR. Up to you to accept/reject :)
Thanks again!
Closing this since https://github.com/felixge/fgprof/pull/11 was closed.