falcon
falcon copied to clipboard
Measuring performance bottleneck
We use Scout APM to monitoring performance.
It seems Falcon and Puma have different approach handling requests.
Falcon has much higher queue time(yellow part in chart, time before request being processed) and low processing time. Like requests are blocked outside of server to wait for entrance.
Puma has much higher ActiveRecord time(green part in chart) and low queue time.
Both become slow during benchmark test and have similar response time.
Currently we're able to increase Falcon's throughput by using 8 processes for each 4 cpu machine, which originally has only 5 processes.
Is there anyway to probe the situation/bottleneck in Falcon?
If you are doing high latency blocking operations in the event loop you will see this kind of response.
Because the core of the event loop for the server is:
connection = accept connection
connection.each_request do |request|
response = process(request)
conntion.send_response(response)
end
It's not quite that simple but that's generally how it fits together.
If you are blocking in process(request)
, we can not receive new requests (e.g. multiplexing ala HTTP/2 nor can we accept more connections.
You need to identify what is the blocking operation, probably a database query, and then decide if async-postgres
or async-mysql
is mature enough to work in your application.
If you have blocking operations that you simply can't avoid, you can spin up a thread and use Async::IO::Notification
for handling reactor <-> thread synchronisation. I can give you some example code.
Is it the same to start Falcon in hybrid mode to use thread for request handling?
Also, I'd like to know how connection pool plays in this part. Does it really help in hybrid or fork mode if requests are considered blocking the event loop reactor?
That is a good question.
Yes, hybrid mode should give you mostly the same performance characteristics as puma cluster mode.
However, ideally you use non-blocking adapters otherwise there are still some cases where you can experience high latency, i.e. if two connections are within the same reactor on the same thread.
Process Model
One parent process spawns N child processes, one reactor per child process.
Thread Model
One parent process spawns N threads, one reactor per thread. GVL contention.
Hybrid Model
One parent process spawns N processes, and each process makes M threads, one reactor per thread. GVL contention, but more threads = better handling of blocking operations.
Let me know if you need further clarifications - happy to discuss.