volley icon indicating copy to clipboard operation
volley copied to clipboard

Benchmark go server using go tip

Open jonhoo opened this issue 9 years ago • 15 comments

The Go server is seeing relatively poor performance scaling compared to the Rust and C servers. Before reporting this as an upstream bug, we should investing how the Go server performs when using the tip version of go.

jonhoo avatar Jun 15 '15 20:06 jonhoo

go-tip

Alas, it seems as though the problem still arises on Go tip.

jonhoo avatar Jun 15 '15 21:06 jonhoo

Posted to golang-nuts.

jonhoo avatar Jun 15 '15 21:06 jonhoo

@jbardin points out in this reply on golang-nuts that the performance drop is probably caused by the overhead introduces by doing (e)polling instead of blocking socket reads. Continuing the discussion in #4 and #5.

jonhoo avatar Jun 16 '15 15:06 jonhoo

jon, you could run the test again with go tip. it has better goroutines performance (http://talks.golang.org/2015/state-of-go-may.slide#8) and now that we know the problem was in the async-io we can expect even better performance for golang.

i have a doubt that golang can archive or not rust/c performance.

diegobernardes avatar Jun 17 '15 17:06 diegobernardes

Has go tip changed significantly in the past two days?

jonhoo avatar Jun 17 '15 18:06 jonhoo

Ah, you mean test go-blocking with go tip? Sure, I'll do that now.

jonhoo avatar Jun 17 '15 18:06 jonhoo

yep :]

diegobernardes avatar Jun 17 '15 19:06 diegobernardes

Done. See https://raw.githubusercontent.com/jonhoo/volley/1d9555441a2d5fa44a712a777fd95dae1503247a/benchmark/perf.png

Performance for go-blocking improves drastically for Go tip, almost to the point where it's as fast as the C and Rust implementations! Cool.

jonhoo avatar Jun 17 '15 19:06 jonhoo

@jonhoo This is great, thank you for doing these benchmarks. Nice to see that Go tip is catching up. :+1:

peterhellberg avatar Jun 17 '15 19:06 peterhellberg

It would be nice to see latency variance as well.

xekoukou avatar Jun 17 '15 20:06 xekoukou

@xekoukou pushed to https://github.com/jonhoo/volley/blob/master/benchmark/plot.dat

jonhoo avatar Jun 17 '15 20:06 jonhoo

@jonhoo i got a bit surprised by the latency of golang in the plot.dat file, it is very fast now, but the latency, omg..

but i think i know what is the problem, one thing went unnoticed, rust and c are creating one thread per connection, golang is creating one thread per cpu core.

looking into the plot.dat file the only entry of go-blocking-tip that has low latency is the one that it has the same number of connections and cpu cores(threads):

go-blocking-tip 40 40 39us 5.89us 1000000
rust            40 40 41us 6.68us 1000000
c-threaded      40 40 40us 7.91us 1000000

i don't know if there is anyway to configure golang to create one real thread per goroutine.

diegobernardes avatar Jun 17 '15 22:06 diegobernardes

Well, I could increase GOMAXPROCS, but that comes with its own set of problems unfortunately. It also shouldn't really matter; to quote the Go runtime docs:

There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit.

An interesting benchmark to see would be a C implementation using a pool of workers instead of spawning a new thread for each request. That should give us more of an apples-to-apples comparison.

jonhoo avatar Jun 17 '15 23:06 jonhoo

An interesting benchmark to see would be a C implementation using a pool of workers instead of spawning a new thread for each request. That should give us more of an apples-to-apples comparison.

Yes, its better to do this.

Well, I could increase GOMAXPROCS, but that comes with its own set of problems unfortunately. It also shouldn't really matter; to quote the Go runtime docs:

There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit.

The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit. This package's GOMAXPROCS function queries and changes the limit.

GOMAXPROCS with the value equal to the number of cpus only make sense when we are doing requests using golang nonblock features, so when anything blocks, the thread got a new goroutine to execute. But in 'go-blocking' we are blocking the thread and the quantity of threads in this case make sense. The go-blocking app should accept a extra argument with the number of connections, only doing this the test gonna be fair.

Well at least this is what i think, dont tested, so cant confirm.

I would make a pr, but don't know why i cant compile the c code to do the tests :[

diegobernardes avatar Jun 18 '15 00:06 diegobernardes

It's not entirely clear how to interpret that statement from the docs. While it is true that we're blocking a user-level goroutine, we are also blocking on a system call, so it might be that Go is smart enough to then allow another goroutine to run. I'm not sure about this though.

Can you open another ticket with the C compilation error you're getting?

jonhoo avatar Jun 18 '15 00:06 jonhoo