fastapi-httpbin
fastapi-httpbin copied to clipboard
Production httpbin site randomly timing out
A few days ago I noticed https://httpbin.dmuth.org/ started hanging for no reason. My dashboards would look like this:
...and I started seeing errors like these in the logs from fly.io:
could not find a good candidate within 90 attempts at load balancing. last error: no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the 'immediate' strategy? have your app's instances all reached their hard limit?)
I then SSHed into the instance and saw that uvicorn
was using 100% of the CPU.
I also poked around in /proc/
and saw that there was only about a dozen file descriptors open, so it's not a resource exhaustion issue.
I tried the following things so far, but have been unable to resolve it:
- ✅ Restarting the VM
- ✅ Changing the count of machines with the
fly scale
command to 0 and then 1 to spin up a new machine - ✅ Running
fly deploy
again - ✅ Turning off Fly's raw TCP check, thinking it was tripping up Uvicorn somehow.
I am continuing to investigate, and have a few other things to try:
- ✅ Turning off the HTTP check from Fly.io
- ✅ Adjusting the URLs that NodePing is hitting
- ✅ Upgrading FastAPI to the latest version and redeploying (this is in progress)
- ✅ Increase the number of workers to 3
- Seeing if I can capture log output from Uvicorn by setting an environment variable.
- Changing the server to Hypercorn