errors on production, socket fail
Hi there, I have migrated some months ago to Falcon on rails app,
suddenly I've started to see weird errors which are not being notified on my reporting error, but users noted that some realtime (socket connection) failed.
the logs is like this:
{"time":"2024-10-17T02:38:15+00:00","severity":"warn","oid":3028920,"pid":1,"subject":"Async::Task","annotation":"Reading HTTP/1.1 requests for Async::HTTP::Protocol::HTTP1::Server.","message":["Task may have ended with unhandled exception.","undefined method `write' for nil:NilClass"],"error":{"kind":"NoMethodError","message":"undefined method `write' for nil:NilClass","stack":"/usr/local/bundle/gems/protocol-http1-0.18.0/lib/protocol/http1/connection.rb:134:in `write_response'\n/usr/local/bundle/gems/async-http-0.63.0/lib/async/http/protocol/http1/server.rb:63:in `each'\n/usr/local/bundle/gems/async-http-0.63.0/lib/async/http/server.rb:40:in `accept'\n/usr/local/bundle/gems/async-io-1.41.0/lib/async/io/server.rb:15:in `block in accept_each'\n/usr/local/bundle/gems/async-io-1.41.0/lib/async/io/socket.rb:58:in `block in accept'\n/usr/local/bundle/gems/async-2.8.2/lib/async/task.rb:161:in `block in run'\n/usr/local/bundle/gems/async-2.8.2/lib/async/task.rb:331:in `block in schedule'\n"}}
not sure what could happen here but seems to be async http gems
the installed dependences were:
async (2.8.2)
console (~> 1.10)
fiber-annotation
io-event (~> 1.1)
timers (~> 4.1)
async-container (0.16.13)
async
async-io
async-http (0.63.0)
async (>= 1.25)
async-io (>= 1.28)
async-pool (>= 0.2)
protocol-http (~> 0.26.0)
protocol-http1 (~> 0.18.0)
protocol-http2 (~> 0.16.0)
traces (>= 0.10.0)
async-http-cache (0.4.3)
async-http (~> 0.56)
async-io (1.41.0)
async
async-pool (0.4.0)
async (>= 1.25)
Thanks for your report, I'll investigate it.
Probably a good first step would be to upgrade to the latest version of Ruby, async-http and falcon as well as all the other dependencies.
When I upgraded to the latest Falcon, the Webrick started in development. Maybe I'm missing something
@michelson can you show me exactly what command you are running?
HI Samuel, rails s on development.
Do you have both puma and falcon in your gemfile.lock?
no, just Falcon on Gemfile. It also happened to me on a new rails app. steps:
- remove puma
- add falcon
- rails s = webrick start.
Okay, let me check what's going on.
samuel@aiko ~/tmp> rails new foob
create
...
run bundle install --quiet
samuel@aiko ~/tmp> cd foob
samuel@aiko ~/t/foob (main)> bundle remove puma
Removing gems from /home/samuel/tmp/foob/Gemfile
puma (>= 5.0) was removed.
samuel@aiko ~/t/foob (main)> bundle add falcon
Fetching gem metadata from https://rubygems.org/.........
Resolving dependencies...
Fetching gem metadata from https://rubygems.org/.........
Resolving dependencies...
samuel@aiko ~/t/foob (main)> bin/rails s
=> Booting Falcon v0.48.3
=> Rails 7.2.1.2 application starting in development http://localhost:3000
=> Run `bin/rails server --help` for more startup options
At least with this simple example, I could not reproduce it. Do you mind showing me your output?
also seeing similar errors:
{
"time": "2025-03-06T23:10:24+00:00",
"severity": "warn",
"oid": 55408,
"pid": 19,
"subject": "Async::Task",
"annotation": "Reading HTTP/1.1 requests for Async::HTTP::Protocol::HTTP1::Server.",
"message": "Task may have ended with unhandled exception.",
"event": {
"type": "failure",
"root": "/app",
"class": "Errno::EPIPE",
"message": "Broken pipe",
"backtrace": [
"/usr/local/lib/ruby/3.4.0/socket.rb:461:in 'BasicSocket#__write_nonblock'",
"/usr/local/lib/ruby/3.4.0/socket.rb:461:in 'BasicSocket#write_nonblock'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/buffered.rb:93:in 'IO::Stream::Buffered#syswrite'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:184:in 'IO::Stream::Generic#drain'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:196:in 'block in IO::Stream::Generic#flush'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:195:in 'Thread::Mutex#synchronize'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:195:in 'IO::Stream::Generic#flush'",
"/usr/local/bundle/gems/protocol-http1-0.30.0/lib/protocol/http1/connection.rb:527:in 'Protocol::HTTP1::Connection#write_chunked_body'",
"/usr/local/bundle/gems/protocol-http1-0.30.0/lib/protocol/http1/connection.rb:598:in 'Protocol::HTTP1::Connection#write_body'",
"/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/protocol/http1/server.rb:133:in 'block in Async::HTTP::Protocol::HTTP1::Server#each'",
"/usr/local/bundle/gems/async-2.23.0/lib/async/task.rb:330:in 'Async::Task#defer_stop'",
"/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/protocol/http1/server.rb:80:in 'Async::HTTP::Protocol::HTTP1::Server#each'",
"/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/server.rb:50:in 'Async::HTTP::Server#accept'",
"/usr/local/bundle/gems/falcon-0.51.0/lib/falcon/server.rb:57:in 'Falcon::Server#accept'",
"/usr/local/bundle/gems/io-endpoint-0.15.2/lib/io/endpoint/wrapper.rb:216:in 'block (2 levels) in IO::Endpoint::Wrapper#accept'",
"/usr/local/bundle/gems/async-2.23.0/lib/async/task.rb:200:in 'block in Async::Task#run'",
"/usr/local/bundle/gems/async-2.23.0/lib/async/task.rb:438:in 'block in Async::Task#schedule'"
]
}
}
these must be in-bound requests. could it be from timeouts configured on the upstream server? we have faraday making requests from an upstream server to our falcon server with a 15 second timeout, and requests are expected to occasionally timeout based on slow requests to other servers downstream from the falcon server.
Huh, I came in to report the exact same thing:
{
"time": "2025-03-11T16:07:55+00:00",
"severity": "warn",
"oid": 90828100,
"pid": 388,
"subject": "Async::Task",
"annotation": "Reading HTTP/1.1 requests for Async::HTTP::Protocol::HTTP1::Server.",
"message": "Task may have ended with unhandled exception.",
"event": {
"type": "failure",
"root": "/root/workspace/web",
"class": "Errno::EPIPE",
"message": "Broken pipe",
"backtrace": [
"/usr/local/lib/ruby/3.3.0/socket.rb:461:in `__write_nonblock'",
"/usr/local/lib/ruby/3.3.0/socket.rb:461:in `write_nonblock'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/buffered.rb:93:in `syswrite'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:184:in `drain'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:196:in `block in flush'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:195:in `synchronize'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:195:in `flush'",
"/usr/local/bundle/gems/protocol-http1-0.30.0/lib/protocol/http1/connection.rb:527:in `write_chunked_body'",
"/usr/local/bundle/gems/protocol-http1-0.30.0/lib/protocol/http1/connection.rb:598:in `write_body'",
"/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/protocol/http1/server.rb:133:in `block in each'",
"/usr/local/bundle/gems/async-2.23.1/lib/async/task.rb:330:in `defer_stop'",
"/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/protocol/http1/server.rb:80:in `each'",
"/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/server.rb:50:in `accept'",
"/usr/local/bundle/gems/falcon-0.51.1/lib/falcon/server.rb:57:in `accept'",
"/usr/local/bundle/gems/io-endpoint-0.15.2/lib/io/endpoint/wrapper.rb:216:in `block (2 levels) in accept'",
"/usr/local/bundle/gems/async-2.23.1/lib/async/task.rb:200:in `block in run'",
"/usr/local/bundle/gems/async-2.23.1/lib/async/task.rb:438:in `block in schedule'"
]
}
}
What we're seeing is a seemingly random flurries of these errors. They seem random at this point (they don't seem to align with any kind of a burst in traffic) and during these events, CPU jumps up to near 100%, and for a few minutes we see these errors flood the logs. Most of the workers restart on their own and things move along until the whole thing happens again. During these events, Falcon seems to stop serving requests.
Like @jakeonfire we are also using Faraday, but not for all requests. Faraday (with the async-http adapter) is the HTTP client we're using for upstream search requests (OpenSearch).
I've not seen any difference between Ruby 3.3.7 or 3.4.2.
@ioquatix Do you have any idea what could be going on?
Thanks for all the logs and data I will review this week.
fwiw ours are fairly steady:
and there is no evidence that they are causing a bigger issue, strangely.
We just had one of these events I was referring to. For us, it looks like this:
And request times during this:
It turns out there is often also the following error during these spikes:
/usr/local/bundle/gems/async-2.23.1/lib/async/scheduler.rb:305:in 'IO::Event::Selector::EPoll#io_wait': execution expired (Faraday::TimeoutError)
from /usr/local/bundle/gems/async-2.23.1/lib/async/scheduler.rb:305:in 'Async::Scheduler#io_wait'
from /usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/buffered.rb:118:in 'IO#wait_readable'
from /usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/buffered.rb:118:in 'IO::Stream::Buffered#sysread'
from /usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:327:in 'IO::Stream::Generic#fill_read_buffer'
from /usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:164:in 'IO::Stream::Generic#gets'
from /usr/local/bundle/gems/protocol-http1-0.34.0/lib/protocol/http1/connection.rb:350:in 'Protocol::HTTP1::Connection#read_line?'
from /usr/local/bundle/gems/protocol-http1-0.34.0/lib/protocol/http1/connection.rb:365:in 'Protocol::HTTP1::Connection#read_line'
from /usr/local/bundle/gems/protocol-http1-0.34.0/lib/protocol/http1/connection.rb:424:in 'Protocol::HTTP1::Connection#read_response_line'
from /usr/local/bundle/gems/protocol-http1-0.34.0/lib/protocol/http1/connection.rb:451:in 'Protocol::HTTP1::Connection#read_response'
from /usr/local/bundle/gems/async-http-0.88.0/lib/async/http/protocol/http1/response.rb:15:in 'Async::HTTP::Protocol::HTTP1::Response.read'
from /usr/local/bundle/gems/async-http-0.88.0/lib/async/http/protocol/http1/client.rb:86:in 'Async::HTTP::Protocol::HTTP1::Client#call'
from /usr/local/bundle/gems/protocol-http-0.49.0/lib/protocol/http/request.rb:87:in 'Protocol::HTTP::Request#call'
from /usr/local/bundle/gems/async-http-0.88.0/lib/async/http/client.rb:146:in 'Async::HTTP::Client#make_response'
from /usr/local/bundle/gems/async-http-0.88.0/lib/async/http/client.rb:104:in 'Async::HTTP::Client#call'
from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:205:in 'block (2 levels) in Async::HTTP::Faraday::Adapter#perform_request'
from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:242:in 'block in Async::HTTP::Faraday::Adapter#with_timeout'
from /usr/local/bundle/gems/async-2.23.1/lib/async/scheduler.rb:562:in 'Async::Scheduler#with_timeout'
from /usr/local/bundle/gems/async-2.23.1/lib/async/task.rb:141:in 'Async::Task#with_timeout'
from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:241:in 'Async::HTTP::Faraday::Adapter#with_timeout'
from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:191:in 'block in Async::HTTP::Faraday::Adapter#perform_request'
from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:233:in 'block (2 levels) in Async::HTTP::Faraday::Adapter#with_client'
from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/clients.rb:107:in 'Async::HTTP::Faraday::PersistentClients#with_client'
from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/clients.rb:162:in 'Async::HTTP::Faraday::PerThreadPersistentClients#with_client'
from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:232:in 'block in Async::HTTP::Faraday::Adapter#with_client'
from /usr/local/bundle/gems/async-2.23.1/lib/kernel/sync.rb:24:in 'Kernel#Sync'
from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:222:in 'Async::HTTP::Faraday::Adapter#with_client'
from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:170:in 'Async::HTTP::Faraday::Adapter#perform_request'
from /usr/local/bundle/gems/async-http-faraday-0.21.0/lib/async/http/faraday/adapter.rb:161:in 'Async::HTTP::Faraday::Adapter#call'
from /usr/local/bundle/gems/opentelemetry-instrumentation-faraday-0.26.0/lib/opentelemetry/instrumentation/faraday/middlewares/tracer_middleware.rb:43:in 'block (2 levels) in OpenTelemetry::Instrumentation::Faraday::Middlewares::TracerMiddleware#call'
from /usr/local/bundle/gems/opentelemetry-api-1.5.0/lib/opentelemetry/trace/tracer.rb:37:in 'block in OpenTelemetry::Trace::Tracer#in_span'
from /usr/local/bundle/gems/opentelemetry-api-1.5.0/lib/opentelemetry/trace.rb:70:in 'block in OpenTelemetry::Trace#with_span'
from /usr/local/bundle/gems/opentelemetry-api-1.5.0/lib/opentelemetry/context.rb:88:in 'OpenTelemetry::Context.with_value'
from /usr/local/bundle/gems/opentelemetry-api-1.5.0/lib/opentelemetry/trace.rb:70:in 'OpenTelemetry::Trace#with_span'
from /usr/local/bundle/gems/opentelemetry-api-1.5.0/lib/opentelemetry/trace/tracer.rb:37:in 'OpenTelemetry::Trace::Tracer#in_span'
from /usr/local/bundle/gems/opentelemetry-instrumentation-faraday-0.26.0/lib/opentelemetry/instrumentation/faraday/middlewares/tracer_middleware.rb:38:in 'block in OpenTelemetry::Instrumentation::Faraday::Middlewares::TracerMiddleware#call'
from /usr/local/bundle/gems/opentelemetry-common-0.22.0/lib/opentelemetry/common/http/client_context.rb:49:in 'block in OpenTelemetry::Common::HTTP::ClientContext#with_attributes'
from /usr/local/bundle/gems/opentelemetry-api-1.5.0/lib/opentelemetry/context.rb:88:in 'OpenTelemetry::Context.with_value'
from /usr/local/bundle/gems/opentelemetry-common-0.22.0/lib/opentelemetry/common/http/client_context.rb:49:in 'OpenTelemetry::Common::HTTP::ClientContext#with_attributes'
from /usr/local/bundle/gems/opentelemetry-instrumentation-faraday-0.26.0/lib/opentelemetry/instrumentation/faraday/middlewares/tracer_middleware.rb:37:in 'OpenTelemetry::Instrumentation::Faraday::Middlewares::TracerMiddleware#call'
from /usr/local/bundle/gems/faraday-2.12.2/lib/faraday/middleware.rb:56:in 'Faraday::Middleware#call'
from /usr/local/bundle/gems/faraday-retry-2.2.1/lib/faraday/retry/middleware.rb:153:in 'Faraday::Retry::Middleware#call'
from /usr/local/bundle/gems/faraday-gzip-3.0.3/lib/faraday/gzip/middleware.rb:42:in 'Faraday::Gzip::Middleware#call'
from /usr/local/bundle/gems/faraday-2.12.2/lib/faraday/rack_builder.rb:152:in 'Faraday::RackBuilder#build_response'
from /usr/local/bundle/gems/faraday-2.12.2/lib/faraday/connection.rb:452:in 'Faraday::Connection#run_request'
from /usr/local/bundle/gems/faraday-2.12.2/lib/faraday/connection.rb:280:in 'Faraday::Connection#post'
It's kind of hard to tell which error comes first though, or if they are even truly connected. I thought it might be helpful to see it as well though.
I think what @jakeonfire and I found while troubleshooting it further is that these errors come up when the server making the inbound request to Falcon closes the connection on a timeout. In our case we were catching external web hooks and doing an unreasonable amount of processing before responding with a 200. Since the processing was of no interest to the caller we moved it to an async block and started responding right away. I believe the errors stopped immediately after that.
I've also been continuing to troubleshoot this and found out a few more things as well. I agree with @artrybalko in that we see the original error (Errno::EPIPE) when upstream clients close their connection before getting a response.
In my original post I said I couldn't trace this to a burst of requests but I think that was incorrect. There actually are bursts it's just in such a small time span I wasn't really seeing them in my metrics. When I say short timespan, I'm talking like in 1 second. Thousands of new connections and requests for a single second.
The spikes seem to be enough to overwhelm Falcon (and now I'm thinking this is really the OS and epoll?) and at that point, it's just a cascade of failures. Falcon workers restarting but there's still IO contention, so they aren't always coming back up, upstream load balancers closing thousands of connections (before Falcon has responded to them), and it's just a mess.
Now I'm thinking this is much closer to the problem mentioned in #212.
For now, I've thrown Nginx in front with some burtstable rate limiting (with delays) to help smooth traffic spikes like this out. So far it's worked great. Today is the first day in a month where we didn't have an issue.
@ioquatix Do I understand that Falcon's current design will basically try to serve absolutely everything that gets thrown to it, regardless of the state of the system? Even to the point of basically taking itself down? Does what I describe, and the errors I'm seeing make sense to you?
The original issue posted by @michelson is with this code:
class Protocol::HTTP1::Connection
# ...
def write_response(version, status, headers, reason = Reason::DESCRIPTIONS[status])
# Safari WebSockets break if no reason is given:
@stream.write("#{version} #{status} #{reason}\r\n")
write_headers(headers)
end
In the latest version of this code, we now have a state machine which validates the behaviour:
class Protocol::HTTP1::Connection
# ...
def close(error = nil)
@persistent = false
if stream = @stream
@stream = nil
stream.close
end
unless closed?
@state = :closed
self.closed(error)
end
end
# ...
def write_response(version, status, headers, reason = Reason::DESCRIPTIONS[status])
unless @state == :open or @state == :half_closed_remote
raise ProtocolError, "Cannot write response in state: #{@state}!"
end
# Safari WebSockets break if no reason is given:
@stream.write("#{version} #{status} #{reason}\r\n")
write_headers(headers)
end
So the original issue should now at worst fail with a Protocol::HTTP1::ProtocolError and ideally the issue is actually handled correctly earlier on as the state machine is much more comprehensive at dealing with client shenanigans.
The initial issues reported by @jakeonfire and @travisbell are caused by clients disconnecting before receiving the full response body. This leads to a "broken pipe" Errno::EPIPE error when the server attempts to write to a socket that the remote end has closed.
{
"time": "2025-03-11T16:07:55+00:00",
"severity": "warn",
"oid": 90828100,
"pid": 388,
"subject": "Async::Task",
"annotation": "Reading HTTP/1.1 requests for Async::HTTP::Protocol::HTTP1::Server.",
"message": "Task may have ended with unhandled exception.",
"event": {
"type": "failure",
"root": "/root/workspace/web",
"class": "Errno::EPIPE",
"message": "Broken pipe",
"backtrace": [
"/usr/local/lib/ruby/3.3.0/socket.rb:461:in `__write_nonblock'",
"/usr/local/lib/ruby/3.3.0/socket.rb:461:in `write_nonblock'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/buffered.rb:93:in `syswrite'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:184:in `drain'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:196:in `block in flush'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:195:in `synchronize'",
"/usr/local/bundle/gems/io-stream-0.6.1/lib/io/stream/generic.rb:195:in `flush'",
"/usr/local/bundle/gems/protocol-http1-0.30.0/lib/protocol/http1/connection.rb:527:in `write_chunked_body'",
"/usr/local/bundle/gems/protocol-http1-0.30.0/lib/protocol/http1/connection.rb:598:in `write_body'",
"/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/protocol/http1/server.rb:133:in `block in each'",
"/usr/local/bundle/gems/async-2.23.1/lib/async/task.rb:330:in `defer_stop'",
"/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/protocol/http1/server.rb:80:in `each'",
"/usr/local/bundle/gems/async-http-0.87.0/lib/async/http/server.rb:50:in `accept'",
"/usr/local/bundle/gems/falcon-0.51.1/lib/falcon/server.rb:57:in `accept'",
"/usr/local/bundle/gems/io-endpoint-0.15.2/lib/io/endpoint/wrapper.rb:216:in `block (2 levels) in accept'",
"/usr/local/bundle/gems/async-2.23.1/lib/async/task.rb:200:in `block in run'",
"/usr/local/bundle/gems/async-2.23.1/lib/async/task.rb:438:in `block in schedule'"
]
}
}
In more detail: An Errno::EPIPE ("Broken pipe") error simply indicates that the client (remote peer) closed the TCP connection on their side mid-response. This typically happens if the client disconnects (e.g. by sending a TCP FIN or triggering a TCP reset) before it has received the full HTTP/1.1 response. In other words, the server attempted to write to the socket, but the underlying TCP stack reported that the remote side has gone away (the socket was closed by the peer, not by the server). Such broken-pipe errors are common in real-world web environments – for example, when a user navigates away from a page early or a proxy times out a slow request – and they generally do not indicate a bug in the server.
Do I understand that Falcon's current design will basically try to serve absolutely everything that gets thrown to it, regardless of the state of the system? Even to the point of basically taking itself down? Does what I describe, and the errors I'm seeing make sense to you?
Falcon mixes the accept call with the event loop. If the event loop gets saturated, the accept call will be invoked less frequently. This leads to the least loaded falcon worker typically accepting the inbound connection first. This generally distributes connections evenly across a single container of falcon servers. However, you'll still be depending on your load balancer to do the right thing if you have multiple containers on different network interfaces. In principle, falcon will eventually accept every connection that is given to it (so you are in principle correct), so it would be on the infrastructure to scale up once saturated. We now report a metric for this in the process title:
The string is documented here. But in our case, we are interested in L=... which indicates the scheduler load. If you see this getting above 0.8 for a sustained amount of time (e.g. more than a few seconds) you may be running close to saturation.
On good hardware, Falcon can easily handle ~50,000 requests per second per event loop. So it's unlikely that Falcon itself is going to cause saturation in typical scenarios, it will be your application code that is essentially getting overloaded. In that case, we recently introduced the fiber-profiler to help detect poorly performing code which can stall the event loop. This isn't the only class of bugs but it would be the biggest one and perhaps one of the easiest ones to address. Be aware that in it's full tracing mode it can incur significant overhead, but if you can run it in a staging environment or shadow environment it will tell you any code paths which are causing contention on the event loop.
It does sound like there are issues here but it's not entirely clear the original issue and the follow up discussion are related. Therefore, as the original issue has been addressed, I'm going to close this issue, but if you feel that you are still having problems, please feel free to open a new issue and I am happy to continue the discussion.
(Also you might be interested in https://github.com/socketry/falcon/issues/154#issuecomment-2832911337 too).
Okay, I've been investigating this issue, and noticed something in at least one scenario where EPIPE was showing up. The response was 304 Not Modified but it also included a body which is in violation of the HTTP RFCs.
As soon as we flush the response with transfer-encoding: chunked, the remote end closes the connection because the response is invalid.
I might try to fix this but will issue a warning too as this is invalid, e.g. [304, {}, ["something"]] is invalid. I'm not sure the exact nature of the response yet, but I thought I'd mention it here.
Tracking here: https://github.com/socketry/falcon/issues/289