Robyn Response refactor

Response refactor

Open JackThomson2 opened this issue 3 years ago • 15 comments

Actually increases performance in static files. The static files are now async & JSON is stringified rust side now.

Breaking change just responding with text is now not allowed.

I have experimented with removing the thread spawn on the blocking request makes synchronous requests much faster

Jul 26 '21 19:07 JackThomson2

Example speed we can get with sync text requests: PR results

oha -n 100000 http://localhost:5000/sync
Summary:
  Success rate:	1.0000
  Total:	3.0909 secs
  Slowest:	0.0104 secs
  Fastest:	0.0001 secs
  Average:	0.0015 secs
  Requests/sec:	32352.5699

  Total data:	1.71 MiB
  Size/request:	17 B
  Size/sec:	565.19 KiB

Response time histogram:
  0.000 [3197]  |■■■
  0.001 [16553] |■■■■■■■■■■■■■■■■■■■
  0.001 [27004] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.002 [22691] |■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.002 [13940] |■■■■■■■■■■■■■■■■
  0.002 [7593]  |■■■■■■■■
  0.003 [3976]  |■■■■
  0.003 [2006]  |■■
  0.004 [1128]  |■
  0.004 [599]   |
  0.005 [1313]  |■

Latency distribution:
  10% in 0.0007 secs
  25% in 0.0010 secs
  50% in 0.0014 secs
  75% in 0.0019 secs
  90% in 0.0025 secs
  95% in 0.0030 secs
  99% in 0.0045 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0025 secs, 0.0009 secs, 0.0037 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0001 secs

Status code distribution:
  [200] 100000 responses

Current master result:

oha -n 100000 http://localhost:5001/sync
Summary:
  Success rate:	1.0000
  Total:	3.6854 secs
  Slowest:	0.0123 secs
  Fastest:	0.0001 secs
  Average:	0.0018 secs
  Requests/sec:	27134.0017

  Total data:	1.81 MiB
  Size/request:	19 B
  Size/sec:	503.46 KiB

Response time histogram:
  0.000 [3007]  |■■■
  0.001 [13101] |■■■■■■■■■■■■■■■
  0.001 [26798] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.002 [25279] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.002 [15623] |■■■■■■■■■■■■■■■■■■
  0.003 [8099]  |■■■■■■■■■
  0.003 [4122]  |■■■■
  0.004 [2058]  |■■
  0.004 [919]   |■
  0.005 [446]   |
  0.005 [548]   |

Latency distribution:
  10% in 0.0009 secs
  25% in 0.0013 secs
  50% in 0.0017 secs
  75% in 0.0022 secs
  90% in 0.0029 secs
  95% in 0.0034 secs
  99% in 0.0045 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0017 secs, 0.0008 secs, 0.0023 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0002 secs

Status code distribution:
  [200] 100000 responses

Jul 26 '21 20:07 JackThomson2

Hey @JackThomson2 , thank you for your PR. I have limited access , so I will do a complete review on 2nd.

Can you give an example of the current main branch on your machine so that it is easier to make comparison with this:

Example speed we can get with sync text requests:

Jul 27 '21 15:07 sansyrox

Also, I think if we use the CatchAll method as mentioned in the PyO3 docs we can remove the need to wrap the text responses in the text function. Screenshot 2021-07-27 at 12 35 46 PM

Jul 27 '21 15:07 sansyrox

Also, I think if we use the CatchAll method as mentioned in the PyO3 docs we can remove the need to wrap the text responses in the text function.

That is awesome, I completely missed that I'll take a look at that later

Jul 27 '21 15:07 JackThomson2

@sansyrox I updated the comment with results from master. Interestingly this is even with the new one doing extra work as it is doing more checks in the flow

Jul 27 '21 18:07 JackThomson2

I've been playing with a few unsafer options which give crazy results as below: Pushing 40k results per second

oha -n 100000 http://localhost:5000/static
Summary:
  Success rate:	1.0000
  Total:	2.4989 secs
  Slowest:	0.0083 secs
  Fastest:	0.0001 secs
  Average:	0.0012 secs
  Requests/sec:	40017.9414

  Total data:	0 B
  Size/request:	0 B
  Size/sec:	0 B

Response time histogram:
  0.000 [3409]  |■■■■
  0.001 [13039] |■■■■■■■■■■■■■■■■
  0.001 [25930] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.001 [25727] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.002 [16145] |■■■■■■■■■■■■■■■■■■■
  0.002 [8100]  |■■■■■■■■■
  0.002 [4040]  |■■■■
  0.003 [1848]  |■■
  0.003 [808]   |
  0.003 [391]   |
  0.004 [563]   |

Latency distribution:
  10% in 0.0006 secs
  25% in 0.0009 secs
  50% in 0.0012 secs
  75% in 0.0015 secs
  90% in 0.0019 secs
  95% in 0.0023 secs
  99% in 0.0030 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0023 secs, 0.0014 secs, 0.0032 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0001 secs

Jul 27 '21 18:07 JackThomson2

@sansyrox I updated the comment with results from master. Interestingly this is even with the new one doing extra work as it is doing more checks in the flow

Nicee 🔥

I've been playing with a few unsafer options which give crazy results as below: Pushing 40k results per second

I have usually stayed away from the unsafe rust option as I used rust for its safety and performance. But how unsafe are we talking about?

Jul 28 '21 05:07 sansyrox

@sansyrox I updated the comment with results from master. Interestingly this is even with the new one doing extra work as it is doing more checks in the flow

Nicee 🔥

I've been playing with a few unsafer options which give crazy results as below: Pushing 40k results per second

I have usually stayed away from the unsafe rust option as I used rust for its safety and performance. But how unsafe are we talking about?

So what I've been doing is using the assume_gil_acquired method which faux acquires the GIL. I've used this on objects which have been passed as parameters, so to my understanding we shoulding hit any threading issues as the value won't be modified after it's sent. I still lock on the function call

Jul 28 '21 08:07 JackThomson2

So what I've been doing is using the assume_gil_acquired method which faux acquires the GIL. I've used this on objects which have been passed as parameters, so to my understanding we shoulding hit any threading issues as the value won't be modified after it's sent. I still lock on the function call

Oh, then it is not a big issue. I used to do it as well but when I talked with the PyO3 folks, they suggested me to use py.allow_threads since I would have to acquire gil anyway(for the function call in your case). So, as long as we can ensure that the app won't crash, it is not an issue.

Jul 28 '21 14:07 sansyrox

Here's an example of what my experiment in now pushing:

oha -n 100000 http://localhost:5000/static
Summary:
  Success rate: 1.0000
  Total:        1.3311 secs
  Slowest:      0.0129 secs
  Fastest:      0.0000 secs
  Average:      0.0007 secs
  Requests/sec: 75123.6686

  Total data:   585.94 KiB
  Size/request: 6 B
  Size/sec:     440.18 KiB

We're now near native rust performance, which is crazy basically a 2x perf boost over current native. Rust gave 90k/sec

Jul 28 '21 14:07 JackThomson2

@JackThomson2 , this is very crazy indeed. What kind of performance are you getting on a native actix server on your pc?

Jul 28 '21 17:07 sansyrox

Also, I think if we use the CatchAll method as mentioned in the PyO3 docs we can remove the need to wrap the text responses in the text function.

@JackThomson2 , did you take a look at this?

Aug 02 '21 11:08 sansyrox

Hi yes, sorry I've been a bit busy. But yes I managed to get that to work, but it took a performance penalty. I was just trying to figure a way to remove that overhead

Aug 03 '21 10:08 JackThomson2

Hi @JackThomson2 👋 . Did you get a chance to look at this PR?

Aug 17 '21 03:08 sansyrox

Eight months ago:

Hi @JackThomson2 wave . Did you get a chance to look at this PR?

Does we can close this merge request, please do.

Apr 30 '22 21:04 stappersg

Stale and implemented in v0.37.0

Aug 01 '23 21:08 sansyrox

Robyn Robyn copied to clipboard

Response refactor

Robyn
Robyn copied to clipboard