Lock free handles
This takes the Handles from the CL I submitted to Go and applies them here. The changes are not that big to the FrankenPHP parts; though I did also remove the "smart pointer" stuff. We just need to be careful to delete handles.
Here's the performance difference for worker mode, visualized by ChatGPT after feeding it 10 benchmarks each:
This is approximately a 9-13% increase in RPS using a very simple PHP script.
I also wrote about the changes to Handles on my blog
Tests are failing for some reason, I'll investigate.
I went ahead and merged #933 into this PR to run a benchmark. It looks like this makes worker/cgi-mode have the same performance characteristics.
Don't merge/review (unless you want to). I submitted a CL to Go and getting good feedback. Already, the CL version is 20-40% faster than this version and doesn't require any changes to the Handles api.
https://go-review.googlesource.com/c/go/+/600875
@withinboredom I've been experimenting a bit with passing pthread_self() instead of creating handles and am seeing similar performance improvements to your branch (going from ~40.000 rps to ~50.000 rps on my 20 core machine).
The basic idea is to keep a sync map on the go side like this:
phpThreads: [
threadId1: {mainRequest *http.Request, workerRequest *http.Request}
threadId2: {mainRequest *http.Request, workerRequest *http.Request}
threadId2: {mainRequest *http.Request, workerRequest *http.Request}
...
]
This allows us eliminate all handles and generally feels like a simpler approach. It might also make extracting metrics easier, since we keep a reference to all requests currently going through the threads. WDYT?
Yes! I don't think you even need a sync.Map as you can just use a slice[nbWorkers] then address the slice. It would be much more efficient and 'lock-free' so long as each worker only modifies its own slice. Also, sync.Map is pretty slow due to dirty tracking, if a sync.Map is required, it is usually faster to just do your own locks and a regular map.
Interesting, I didn't know sync.Map was that inefficient. I'll try making a PR once the filewatcher branch has moved forward..
sync.Map isn't necessarily inefficient, it efficiently does what it advertises, but it only is useful in certain situations, and this isn't one of them (cgo Handles uses sync.Map under the hood). For Handles, sync.Map almost always takes the "slow path" through sync.Map and not the fast path (which only happens when there is contention on the same thread, which is almost never the case for FrankenPHP).
Replaced by https://github.com/dunglas/frankenphp/pull/1073.