spicedb icon indicating copy to clipboard operation
spicedb copied to clipboard

Intermittent 502s when running SpiceDB behind AWS ALB

Open alexshanabrook opened this issue 5 months ago • 1 comments

What platforms are affected?

linux

What architectures are affected?

amd64

What SpiceDB version are you using?

v1.34.0-amd64

Steps to Reproduce

After making several basic check permissions requests in a loop through an AWS ALB to SpiceDB, after some amount of time (usually within ~20 minutes), we see the following error come up:

Err terminated with errors error="rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway)"

This error resolves on future requests without any changes on our side, but in the meantime a request failed—and the error continues to come up at seemingly arbitrary points if we let the loop continue. The SpiceDB target group protocol is set to gRPC.

Expected Result

We expect not to see these transient errors running SpiceDB behind an ALB.

Looking into the access logs for our ALB, we see the 502. In the log the request_processing_time and target_processing_time are set, meaning the request reached SpiceDB. We're not sure what it means that the target_processing_time is set indicating the load balancer may have received headers from SpiceDB, but the response_processing_time is -1 meaning the load balancer didn't receive a response from the target. AWS suggests that this might happen when either the target closed the connection while the load balancer had an outstanding request, or the target response is malformed or contains invalid HTTP headers.

We tried setting the --grpc-max-conn-age flag to a large number to check the issue isn't that the keep-alive for SpiceDB is shorter than the timeout on the load balancer, but still saw the same errors.

Actual Result

Error:

Screenshot 2024-09-05 at 1 47 54 PM

alexshanabrook avatar Sep 06 '24 21:09 alexshanabrook