HTTP.jl feat(server): spawn task sooner in listenloop

~Uses Threads.@spawn instead of @async in listenloop~
moves the ssl handshake into the spawned task, so that the next accept happens faster
adds 5 second timeout to ssl handshake
makes ssl lockable because I don't trust MbedTLS to work in a potentially multi-threaded environment
makes conns dict lockable because that's also not thread safe in itself if I understand correctly.

(this work has a lot of ideas and snippets from @tanmaykm 🙏🏽)

Sep 02 '23 07:09 pankgeorg

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (15c6451) 82.71% compared to head (e608d08) 82.57%.

Files	Patch %	Lines
src/Servers.jl	81.48%	5 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1102      +/-   ##
==========================================
- Coverage   82.71%   82.57%   -0.14%     
==========================================
  Files          32       32              
  Lines        3054     3065      +11     
==========================================
+ Hits         2526     2531       +5     
- Misses        528      534       +6

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Sep 02 '23 07:09 codecov-commenter

Uses Threads.@spawn instead of @async in listenloop

I think using @async rather than @spawn was a conscious decision, but i can't remember why now -- so hopefully @quinnj remembers 😊

Sep 02 '23 09:09 nickrobinson251

Uses Threads.@Spawn instead of @async in listenloop

I think using @async rather than @spawn was a conscious decision, but i can't remember why now -- so hopefully @quinnj remembers 😊

Seeing old PRs it may be either that Threads.@spawn may be more expensive or that the connections dict is not thread safe (I now lock it). And, of course, this is definitely a breaking change now that I think of it, because user's f in handle_connection is definitely not 100% thread safe so that's probably best suited for another PR.

Sep 02 '23 10:09 pankgeorg

And, of course, this is definitely a breaking change now that I think of it, because user's f in handle_connection is definitely not 100% thread safe so that's probably best suited for another PR.

Yes, i think that's it. And i think we rely on that in our internal Server code (at RAI)... i see this comment in some internal server code

# HTTP.serve! uses `@async` under the hood. Because of `@async` and because there are no
# yield points during server state manipulation, we can modify the server state within                  
# `handle_*` methods without having to worry about thread safety, like we would if
# HTTP.serve! used `@spawn` to handle requests. Actual work is done in the `@spawn`ed                        
# tasks which **don't** access the `Server`.

Sep 02 '23 14:09 nickrobinson251

Makes sense. I think the spirit of this PR doesn't need Threads.@spawn. I'll have another one in the making with another idea that may fit better. Thanks for taking a look 🤗

Sep 02 '23 14:09 pankgeorg

I'm not sure how to fix the invalidations :(

~Also, it seems that not locking MbedTLS breaks down everything, with the listening socket ending up in a FIN_WAIT2. Locking on the other hand, seems to introduce a significant slowdown. I'll convert this to a draft for now, until I have more time to work on it.~ nvm, just had a bug.

Sep 07 '23 12:09 pankgeorg

Sorry for the confusion, I just missed an UndefVarErr for a second. @quinnj I'm not sure how to fix the invalidations issue, but the PR should be ready for review other than that. Unfortunalely, I can't get a performance improvement out of it, but it's not slower either (both this PR and #master are around 35 requests/second)

Sep 07 '23 14:09 pankgeorg

Benchmarked it on a simple hello world endpoint with ab

before

Requests per second:    54.70 [#/sec] (mean)
Time per request:       1827.989 [ms] (mean)
Time per request:       18.280 [ms] (mean, across all concurrent requests)
Transfer rate:          10.47 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    8  14.0      1      51
Processing:    78 1329 292.0   1432    1856
Waiting:       78 1328 292.0   1432    1847
Total:         98 1336 283.7   1435    1858

after

Requests per second:    69.11 [#/sec] (mean)
Time per request:       1447.051 [ms] (mean)
Time per request:       14.471 [ms] (mean, across all concurrent requests)
Transfer rate:          13.23 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    8  14.6      1      53
Processing:    73 1293 315.3   1419    1466
Waiting:       63 1293 315.4   1419    1466
Total:         75 1302 306.4   1421    1513

Sep 07 '23 15:09 krynju

we rely on that in our internal Server code (at RAI)...

To clarify, we're still in favour of changing @async -> @spawn.

At RAI we currently have some code that is written based on the assumption HTTP is using @async here. We'd love to see HTTP.jl move away from @async and use @spawn instead, we'd just want that to be done in a breaking release (and of course we'd need to rewrite some internal code before upgrading to that new HTTP.jl version)

Mar 26 '24 17:03 nickrobinson251