fastapi-httpbin Production httpbin site randomly timing out

Production httpbin site randomly timing out

Open dmuth opened this issue 4 months ago • 0 comments

A few days ago I noticed https://httpbin.dmuth.org/ started hanging for no reason. My dashboards would look like this:

Screenshot by Dropbox Capture

...and I started seeing errors like these in the logs from fly.io:

could not find a good candidate within 90 attempts at load balancing. last error: no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the 'immediate' strategy? have your app's instances all reached their hard limit?)

I then SSHed into the instance and saw that uvicorn was using 100% of the CPU.

I also poked around in /proc/ and saw that there was only about a dozen file descriptors open, so it's not a resource exhaustion issue.

I tried the following things so far, but have been unable to resolve it:

✅ Restarting the VM
✅ Changing the count of machines with the fly scale command to 0 and then 1 to spin up a new machine
✅ Running fly deploy again
✅ Turning off Fly's raw TCP check, thinking it was tripping up Uvicorn somehow.

I am continuing to investigate, and have a few other things to try:

✅ Turning off the HTTP check from Fly.io
✅ Adjusting the URLs that NodePing is hitting
✅ Upgrading FastAPI to the latest version and redeploying (this is in progress)
✅ Increase the number of workers to 3
Seeing if I can capture log output from Uvicorn by setting an environment variable.
Changing the server to Hypercorn

Feb 06 '24 22:02 dmuth

fastapi-httpbin fastapi-httpbin copied to clipboard

Production httpbin site randomly timing out

fastapi-httpbin
fastapi-httpbin copied to clipboard