fastapi-httpbin icon indicating copy to clipboard operation
fastapi-httpbin copied to clipboard

Production httpbin site randomly timing out

Open dmuth opened this issue 4 months ago • 0 comments

A few days ago I noticed https://httpbin.dmuth.org/ started hanging for no reason. My dashboards would look like this:

Screenshot by Dropbox Capture

Screenshot by Dropbox Capture

...and I started seeing errors like these in the logs from fly.io:

could not find a good candidate within 90 attempts at load balancing. last error: no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the 'immediate' strategy? have your app's instances all reached their hard limit?)

I then SSHed into the instance and saw that uvicorn was using 100% of the CPU.

I also poked around in /proc/ and saw that there was only about a dozen file descriptors open, so it's not a resource exhaustion issue.

I tried the following things so far, but have been unable to resolve it:

  • ✅ Restarting the VM
  • ✅ Changing the count of machines with the fly scale command to 0 and then 1 to spin up a new machine
  • ✅ Running fly deploy again
  • ✅ Turning off Fly's raw TCP check, thinking it was tripping up Uvicorn somehow.

I am continuing to investigate, and have a few other things to try:

  • ✅ Turning off the HTTP check from Fly.io
  • ✅ Adjusting the URLs that NodePing is hitting
  • ✅ Upgrading FastAPI to the latest version and redeploying (this is in progress)
  • ✅ Increase the number of workers to 3
  • Seeing if I can capture log output from Uvicorn by setting an environment variable.
  • Changing the server to Hypercorn

dmuth avatar Feb 06 '24 22:02 dmuth