bramble
bramble copied to clipboard
Socket hang up with long running request to service
Hey!
I experience a weird behaviour when having long running requests in my service connected to the gateway.
I run a PHP-based service behind the gateway, that sometimes needs up to 45s to return the response. In 90% of the cases the response is not returned by the gateway and instead I get a Socket hang up
back.
I already enabled the Limits plugin and put this there
{
"name": "limits",
"config": {
"max-response-time": "120s",
"max-request-bytes": 1000000
}
}
This removes the initial reached timeout
error message, but still I have the problem that the connection between the gateway and the service somehow gets lost.
I am 100% sure, that the response is correct and is returned by the (backend) service properly. When calling the GraphQL endpoint of the backend service directly, there is no issue at all.
Does that somehow sound familiar to you or any hint where to look?
Thanks!
Hi @codedge, can you post a copy of the response you're getting? The string Socket hang up
seem to show up when I search the go stdlib and you mention later it's a reached timeout
.
Does it log the request when it fails?
Sorry for the confusion.
1. Resolving the reached timeout
At first I got a reached timeout
error. This error was directly visible inside the logs of Bramble. I figured out, that by using the limits plugin with the above mentioned configuration, I can get around this error.
This is solved ✔️
2. The Socket hang up
problem
This error is returned by curl
(or Postman) or any other GraphQL client. There is no other response.
Error in curl
Error in Postman
I tend to say this is some keepAlive
/idle
timeout problem.
I also found this link, which talks about the net.http.Server.WriteTimeout
.
It logs the request towards the backend service, but it does not log the response coming back.
.. and I can confirm, that changing the WriteTimeout
to f. ex. 60
func runHandler(ctx context.Context, wg *sync.WaitGroup, name, addr string, handler http.Handler) {
srv := &http.Server{
Addr: addr,
Handler: handler,
ReadTimeout: 5 * time.Second,
WriteTimeout: 60 * time.Second,
IdleTimeout: 120 * time.Second,
}
// ...
}
everything works flawlessly.
Do you think you can make this configurable via the limits plugin?
Update
I can see that there are three server instances runnning - public
, private
, metrics
. I guess in my case only the one for public
is the relevant one.
Ideally the user is able to configure this for each of these three.
I would create a PR (if you don't find time).
Thanks for doing the debugging @codedge, that makes sense, if the write timeout is set to 10s by default it will be closing the socket before your service has responded.
It would be useful to have bramble craft a timeout response in that situation but we can make the socket settings tuneable as well.
The public
and private
muxs are there so you can have plugins apply different middleware to an published endpoint and an internal endpoint, i.e. we have auth on the public
mux which is exposed via ingress to our webapp and the private
mux serves backend services which are inside our VPC.
Is there a release planned to include the new configuration?