FrameworkBenchmarks icon indicating copy to clipboard operation
FrameworkBenchmarks copied to clipboard

[Elixir/phoenix] Implementing suggestions from @josevalim

Open atavistock opened this issue 1 year ago • 12 comments

Conversation in https://github.com/TechEmpower/FrameworkBenchmarks/pull/9198 lead to some meaningful optimizations

Specifically

  • Obtain one db connection for all queries within a request.
  • Optimized flow of getting world ids

atavistock avatar Sep 30 '24 01:09 atavistock

This is beautiful, thank you for working on it! :heart:

The only other feedback I have, which we should explore in a separate pull request anyway, is to increase the pool size. For multiple queries, we are not getting more performance from high concurrency numbers, and I assume this is either because our pool size is too small (we use 50, some of the Rust repos use 1024) or our polling is a bottleneck (it is a single process).

I am away from my computer with 10 cores, so when I am back home, I want to run some benchmarks on top of Ecto polling and try to get some numbers. In particular, I may want to add multiple pools inside Ecto, and then we can try running 8 pools with 32 connections each or similar. But we should probably wait for this PR to be merged and see a trial run, so we can see baseline numbers. :)

If you want, I can keep you posted on the Ecto efforts. Have a fantastic week!

josevalim avatar Sep 30 '24 07:09 josevalim

@srcrip I just wanted to drop a note that this PR was in flight to do Repo.checkout where you were adding Repo.transaction. I tested with checkout, transaction, and checkout+transaction and in these specific performance tests it seems like just the just using Repo.checkout is the best performance presumably because theres very few writes.

atavistock avatar Oct 03 '24 18:10 atavistock

Correct. transaction is checkout+begin+commit, but if you are only doing reads, you don't need begin+commit. :)

josevalim avatar Oct 03 '24 18:10 josevalim

I have added pool_count support to EctoSQL v3.12.1+: https://github.com/elixir-ecto/ecto_sql/pull/636

My suggestion is that, once this PR is merged and we measure its new baseline, we should update ecto_sql and use this configuration:

pool_count: 16,
pool_size: 32

This is exactly 512 connections. We can also try pool_count: 32, pool_size: 32 (some Rust benchmarks use 1024 connections) but, given this is tuning, we need to have the numbers to compare against. :)

josevalim avatar Oct 07 '24 08:10 josevalim

Thank you @atavistock and @josevalim, yes that makes sense. I also agree that we probably need to increase the pool size.

When I made some initial improvements, I got way better results on my machine than in the last runs on https://tfb-status.techempower.com/. I'm not really sure why. If you look at the results there it seems like it's gotten worse over the past few runs. But I have no idea as to why.

srcrip avatar Oct 07 '24 20:10 srcrip

@NateBrady23 I think this one should be good to merge and we'll have a follow up based on the Ecto changes that @josevalim did.

atavistock avatar Oct 07 '24 23:10 atavistock

@srcrip some numbers have come out for using transactions and it already does better, so I am sure we are in the right track: https://www.techempower.com/benchmarks/#section=test&runid=176ba510-3607-4faa-996e-74f0778b88d4&hw=ph&test=query

Btw, it seems the plain text benchmark is failing on the continuous benchmarks. Any idea why?

josevalim avatar Oct 10 '24 07:10 josevalim

@josevalim I think there's something up with the compression headers on the plaintext benchmark. I tried to properly disable it in bandit, as that is what is required. But not sure if I made a mistake in the implementation.

edit: ahhh, I didn't see your comment about the char type thing. that makes a lot more sense.

srcrip avatar Oct 10 '24 13:10 srcrip

@srcrip I double checked your compression configs and they look right to me (the one inside :http, we don't need the one at the endpoint level). So it has to be the charset. :)

josevalim avatar Oct 10 '24 16:10 josevalim

Okay, I've also pulled the version of Ecto with support for pool_count.

I tested different permutations of count and size, and arrived at 24 pools with 64 connections each which seems to be the best performance within the connection limit for the docker container. Having fewer pools started to create contention for connections, having more pools seemed to reduce throughput. I also did test fewer overall connections (16/64 and 32/32) but 24/64 seemed to be the right balance on my Macbook M2

atavistock avatar Oct 10 '24 20:10 atavistock

How many cores do you have on your M2? I am thinking that for Techempower, we probably want pool count to be the double of cores (in their case, that would be 56) and 15 connections on each?

josevalim avatar Oct 10 '24 20:10 josevalim

@NateBrady23 I've seen some links to run ids, do those include these pull requests and how can I find the most current?

atavistock avatar Oct 12 '24 17:10 atavistock

@atavistock This will be included in the next full run that starts on https://tfb-status.techempower.com

NateBrady23 avatar Oct 22 '24 14:10 NateBrady23

Awesome job @atavistock and @srcrip, Phoenix is doing much better in the benchmarks now! 🎉 In some of those we are putting almost 10x of what we got before! https://www.techempower.com/benchmarks/#section=test&runid=1300aee6-f9c9-42a2-8d17-252ba597202f&hw=ph&test=db

Although it unfortunately still fails at the plaintext one... 🤔

josevalim avatar Oct 28 '24 22:10 josevalim

Good progress!

So first the good news, I can reproduce the plaintext error locally. Now the bad news, the problem seems to be within Bandit.

Testing with ApacheBench and 160 concurrent workers, 10,000 connections, and zero throttling:

Cowboy and Conn.Plug starts to noticeably slow down almost immediately and after 2000 requests its noticeably crawling. To its credit it does eventually return a valid response for every request with a p95 of 14 seconds.

In contrast Bandit is clearly faster at lower numbers and just blazes through requests but then around 7000 requests it just completely collapses and stops accepting new connections for 30+ seconds or so. The largest test here goes up to 256 workers with 10 threads and sends over 50,000 requests, which is much more than my little load test.

Filing an issue with Bandit on this and open to any ideas on how to help resolve this.

On Mon, Oct 28, 2024 at 3:12 PM José Valim @.***> wrote:

Awesome job @atavistock and @srcrip, Phoenix is doing much better in the benchmarks now! 🎉 In some of those we are putting almost 10x of what we got before! https://www.techempower.com/benchmarks/#section=test&runid=1300aee6-f9c9-42a2-8d17-252ba597202f&hw=ph&test=db

Although it unfortunately still fails at the plaintext one... 🤔

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

atavistock avatar Oct 29 '24 01:10 atavistock

Great digging! Mat (from Bandit) is super responsive, so please open up an issue and share it with us here (or ping me there, I will be glad to contribute).

josevalim avatar Oct 29 '24 07:10 josevalim

@atavistock btw, do you think it is worth doing a run with pool_count: 128, pool_size: 8 to see if we get even better numbers?

josevalim avatar Oct 29 '24 07:10 josevalim

@josevalim Still trying to figure out why the plaintext test is getting errors. https://github.com/mtrudel/bandit/issues/431

atavistock avatar Nov 28 '24 03:11 atavistock

Awesome, thank you for your continuous effort on this!

josevalim avatar Nov 28 '24 09:11 josevalim

@josevalim / @srcrip / @mtrudel Just thought it was worth pointing out that the new public benchmarks are out.

https://www.techempower.com/benchmarks/#hw=ph&test=composite&section=data-r23

atavistock avatar Mar 05 '25 21:03 atavistock

Those are great improvements across the board. Fantastic work @atavistock and everyone involved!

josevalim avatar Mar 06 '25 09:03 josevalim

Much better! I do wonder @josevalim and @atavistock if there's still something slowing down the plaintext benchmark though. One of the rails benchmarks is just slightly outperforming the phoenix+bandit one and it seems like maybe there's still something happening there?

srcrip avatar May 26 '25 15:05 srcrip

I see that. The code feels like a bit of a cheat, as they're they're not really even using Rails, but routing to a proc that returns a canned Rack response. They're doing the same for the JSON test.

from routes.rb

   PlaintextApp = ->(env) { [200, {'Server' => 'Rails', 'Content-Type' => 'text/plain'}, ['Hello, World!']] }
   get "plaintext", to: PlaintextApp

atavistock avatar May 27 '25 04:05 atavistock

I dropped a comment on the other PR asking for some clarification. If it is ok to forego the Rails request/response objects, then we can likely do something similar and use canned responses from Bandit.

josevalim avatar May 27 '25 07:05 josevalim