pool icon indicating copy to clipboard operation
pool copied to clipboard

Performance problem with a single stripe

Open parsonsmatt opened this issue 3 years ago • 8 comments

Matt Russell on Twitter identified that Data.Pool is evidently very slow with a single stripe.

Also the data.pool thing is pretty crazy. With many threads (i.e. any webserver) hitting the same pool the whole thing just locks up. It's doing tons of retries in stm. Single biggest bottleneck as we scaled. Stripes help...but still don't love the performance

parsonsmatt avatar Dec 30 '20 23:12 parsonsmatt

To provide some context (I work with Matt)- here's a benchmark we made to try to better understand the effect of changing the number of stripes:

Repo: https://github.com/Simspace/resource-pool-benchmarks

Example run (grouped by total pool size, higher number of updates is better): run.txt

It makes sense to me that having only 1 or 2 stripes would incur a performance penalty, but I guess the question is whether this level of drop off is expected. The benchmark appears to show that a small number of stripes is worse by a significant amount.

I did play around a bit with swapping in a TQueue for the entries list, which perhaps could reduce contention. My recollection is that it was a little helpful, but not a silver bullet, but I don't think I ever got a full set of benchmarks on it. I can run those benchmarks if that would be helpful.

Cheers

asivitz avatar Jan 05 '21 16:01 asivitz

Oh, what connection pool settings are you all using with what levels of traffic? Lumi’s business doesn’t strike me as needing to process huge volumes of API requests like say, an ad network, so wondered what kind of volume this becomes an issue at.

MaxGabriel avatar Jan 06 '21 17:01 MaxGabriel

Different Matt :smile: SimSpace probably handles a lot more traffic than Lumi.

parsonsmatt avatar Jan 06 '21 17:01 parsonsmatt

Ahh ok

MaxGabriel avatar Jan 06 '21 17:01 MaxGabriel

Ah sorry for the delay, this got buried in my inbox.

I think there are a bunch of ways to mitigate the problem (better caching, push vs pull, pgbouncer?), but with some areas that have a "naive" implementation we are able to trigger this issue with 1 stripe easily around several hundred requests per second. Around 200-300 I think, although I dont have the numbers easily on hand. Using more stripes certainly improves the situation, although we've found it to still be a less of a major speedup than we anticipated.

mjrussell avatar Jan 25 '21 04:01 mjrussell

I found a bug in the resource-pool-benchmarks program above. It relies on the resources being destroyed by the time destroyAllResources returns, but it looks like the resources are being destroyed asynchronously. In particular, if I tweak the test such that 10 threads grab all 10 resources and then block until they are killed by the killThread, the test reports a number of resources being used which is less than 10. If I switch from forkIO/killThread to async/cancel, then I do get 10.

gelisam avatar Mar 24 '21 01:03 gelisam

I have added a branch which fixes the bug, but also switches the focus of the benchmark from the number of stripes to the "doing tons of retries in stm" claim.

stm uses the optimistic concurrency control approach to transactions, meaning it runs transactions in parallel and hope they don't conflict. This approach works best when most threads happen to use distinct resources. If this assumption is not satisfied, that is, most threads happen to touch the same TVars, then all the transactions are going to fail their validation step at the end of the transaction and they are going to retry. If this is what is happening in this case, then the solution would be to switch from stm to something which uses pessimistic concurrency control, like MVars.

I have thus changed the benchmark to pit several pool implementations against each other:

  • Data.Pool with one stripe
  • an implementation based on an IORef
  • two implementations based on an MVar
  • an implementation based on STM

If my hypothesis is right, then Data.Pool and STM should both perform poorly compared to the MVar implementations. However, the results were completely different from what I expected:

results (higher is better):
Data.Pool:          479398
IORefPool:        30989419
YieldingMVarPool:   149114
BlockingMVarPool:   117487
TVarPool:           270897

The IORef implementation wiped the floor, Data.Pool had a very good showing, and my hypothesis was destroyed: MVar performed worse than Data.Pool and STM.

Note that my implementations do a much simpler job than Data.Pool, so please don't interpret those results to mean that Data.Pool should switch to IORef. Instead, the conclusion I am drawing from this is that Data.Pool is doing a damn fine job compared to the obvious alternatives, even with a single stripe, and even under the ridiculously-high load of grabbing and releasing resources from the pool in a tight loop, which is much faster than http requests could come in.

gelisam avatar Mar 24 '21 16:03 gelisam

linking https://github.com/bos/pool/pull/42

coderfromhere avatar Dec 14 '21 22:12 coderfromhere