Robyn icon indicating copy to clipboard operation
Robyn copied to clipboard

Scaling to multiple cores is poor

Open JackThomson2 opened this issue 3 years ago • 25 comments

We should experiment with ways to speed this up

JackThomson2 avatar Jul 28 '21 09:07 JackThomson2

@JackThomson2 , doesn't tokio automatically handle this?

sansyrox avatar Jul 28 '21 14:07 sansyrox

It does but we're bottlenecked by the GIL, in testing manually dropping the number of threads actually improves the performance

JackThomson2 avatar Jul 28 '21 14:07 JackThomson2

@JackThomson2 , what do you mean by "dropping threads manually"?

sansyrox avatar Jul 28 '21 17:07 sansyrox

Setting the number_workers parameter when making the actix server

JackThomson2 avatar Jul 29 '21 09:07 JackThomson2

@JackThomson2 , I was thinking to give the developer the option of setting the number of workers.

sansyrox avatar Jul 31 '21 10:07 sansyrox

We should be able to scale up in number of cores really though. I've been experimenting with a multiprocessing approach

JackThomson2 avatar Aug 03 '21 10:08 JackThomson2

I've been experimenting with a multiprocessing approach

@JackThomson2 , this statement always makes me excited 😅

sansyrox avatar Aug 03 '21 19:08 sansyrox

@sansyrox so far so good news, here's a demo of us sharing the same socket between multi processes. https://github.com/JackThomson2/shared_sockets

JackThomson2 avatar Aug 04 '21 18:08 JackThomson2

Hey @JackThomson2 . I went through the codebase and it looks amazing!

I have one question, are you planning to handle HTTP requests manually over this? Or are you trying to just replace the Actix Socket that we use atm?

sansyrox avatar Aug 04 '21 20:08 sansyrox

We can pass these sockets into the actix server, so we could spin up an actix server per process and share that socket between the process

JackThomson2 avatar Aug 04 '21 20:08 JackThomson2

@JackThomson2 got it. This makes sense. Amazing work! 🔥

sansyrox avatar Aug 04 '21 21:08 sansyrox

But the key awesome part is we can acquire the GIL once and hold onto it as we'd have 1 worker per thread reducing overhead.

JackThomson2 avatar Aug 05 '21 08:08 JackThomson2

@JackThomson2 , do you mean that we will have one GIL per process? Or a single GIL will moderate every process and they will spawn their child threads?

sansyrox avatar Aug 05 '21 12:08 sansyrox

Each process has its own separate GIL which means we can have more, and as we established python is currently the bottleneck it should help ease that.

JackThomson2 avatar Aug 05 '21 16:08 JackThomson2

@sansyrox I have a working example here: https://github.com/JackThomson2/robyn/tree/multiprocess

JackThomson2 avatar Aug 09 '21 08:08 JackThomson2

@JackThomson2 , thank you for the great work! ✨

I have taken a look at it and I have a few questions:

  1. Did you compare the performance with the original code?
  2. https://github.com/JackThomson2/robyn/commit/6616b9aa892246cc766b04815932b6c65ab82d1b#diff-48d27698cc708c7efe770fd897e4885efaa0a15cae30d54936068603ba03bbd9R70 , why have you set workers=1 here?
  3. Why is the loop iterating in the range of 2 ?

Thanks again! 😄

sansyrox avatar Aug 09 '21 08:08 sansyrox

@sansyrox

  1. Perf wasn't as great as I was originally expecting it was only around 20% ish increase, I'll play around with the code to try and figure the bottleneck
  2. The idea was to have one worker per core, so we spin up a process per core.
  3. This is as my machine is only a dual core, I haven't changed it to the num of physical cpus

I've been experimenting with an alternative routing library, which I'll hopefully get pushed soon

Jack

JackThomson2 avatar Aug 09 '21 08:08 JackThomson2

Perf wasn't as great as I was originally expecting it was only around 20% ish increase, I'll play around with the code to try and figure the bottleneck

@JackThomson2 , has the performance increased to 120% of original or has it dropped down to 20% of original?

This is as my machine is only a dual core, I haven't changed it to the num of physical cpus

You can use this instead of hard coding it.

import multiprocessing

multiprocessing.cpu_count()

The idea was to have one worker per core, so we spin up a process per core.

Why not multiple processes per core? Is it because they will compete against each other?

sansyrox avatar Aug 09 '21 10:08 sansyrox

@sansyrox 120% of original, but I'd like that to be higher. I will look into that need to make sure that returns physical cores rather than virtual cores.

JackThomson2 avatar Aug 10 '21 08:08 JackThomson2

@sansyrox Ever since the updating the pyo3 in the new release performance has dropped fairly significantly. We should be catching this before releasing versions. It dropped from 27k per second, to about 19k for non async functions

JackThomson2 avatar Aug 12 '21 12:08 JackThomson2

@JackThompson2 , this is very strange. It is the same for me. I tried using robyn 0.5.3 and 0.6.0 on the same machine. Both of them are serving around 22000 req/sec for me. Do you think that there could be a possibility that a function could be a bottleneck on our code?

On Thu, Aug 12, 2021 at 12:54 PM Jack Thomson @.***> wrote:

@sansyrox https://github.com/sansyrox Ever since the updating the pyo3 in the new release performance has dropped fairly significantly. We should be catching this before releasing versions. It dropped from 27k per second, to about 19k for non async functions

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sansyrox/robyn/issues/58#issuecomment-897614878, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHEOIBRTOKLWKXWZXFWCTETT4O777ANCNFSM5BEAHRXA .

sansyrox avatar Aug 12 '21 13:08 sansyrox

@sansyrox Yes but 0.5.3 wasn't using actix, before the upgrade of pyo3 and async py it was hitting 27k

JackThomson2 avatar Aug 12 '21 13:08 JackThomson2

@JackThomson2 , that is strange. But it was a mandatory update tho, the builds were failing on pyo3 v0.13.* .

sansyrox avatar Aug 12 '21 14:08 sansyrox

@JackThomson2 , I think I may know a reason for the slowness.

sansyrox avatar Aug 12 '21 16:08 sansyrox

@JackThomson2 , did you get a chance to look at this again?

@sansyrox 120% of original, but I'd like that to be higher. I will look into that need to make sure that returns physical cores rather than virtual cores.

sansyrox avatar Aug 21 '21 11:08 sansyrox