l2met icon indicating copy to clipboard operation
l2met copied to clipboard

Possible problem with requests timing out

Open BRMatt opened this issue 12 years ago • 7 comments

It seems l2met is taking a long time to accept some requests, causing them to timeout and trigger Heroku 503 errors. The logs show the 30 seconds or so leading up to the errors, is there any other information that'd be useful?

The instance is running cfa2fc0cebd2586d9e2b9ee94905eed6e8cf5703 on heroku with a 2 line change to measure the size of deadline misses.

BRMatt avatar Aug 19 '13 09:08 BRMatt

@BRMatt Interesting. Can you show me your Procfile and any relevant environment variables?

ryandotsmith avatar Aug 19 '13 15:08 ryandotsmith

Procfile:

web: ./l2met -receiver=true -outlet=true -port=$PORT -outlet-ttl=10s -recv-deadline=4

Added the -recv-deadline line this morning after those errors were reported. The env vars are pretty standard - METCHAN_URL, APP_NAME and SECRETS.

BRMatt avatar Aug 19 '13 15:08 BRMatt

Updated the gist with another onslaught of 5XX errors. It seems the app received a large number of log payloads in a short amount of time and was unable to cope with new connections?

BRMatt avatar Aug 25 '13 15:08 BRMatt

@BRMatt How strange. Can you add heroku runtime metrics onto this app? I would be curious to see the metal metrics on this dyno during these turbulent times.

From the logs, it looks like you are doing less than 100 http requests per second. I have benchmarked l2met at much higher throughput.

ryandotsmith avatar Aug 25 '13 16:08 ryandotsmith

Sure thing, here're some more logs of about 2 minutes prior to some request timeouts. By the looks of things the dyno's not under any stress at all.

BRMatt avatar Aug 25 '13 22:08 BRMatt

Do you have to perform any actions to bring the system back to a healthy state? How do you recover? Also, how are you noticing these problems?

ryandotsmith avatar Aug 26 '13 03:08 ryandotsmith

Do you have to perform any actions to bring the system back to a healthy state? How do you recover?

We don't do anything, the errors are often sporadic. Some seem to be near a dyno restart (in which case we get H13, "Connection closed without response" errors), though most appear to happen randomly.

Also, how are you noticing these problems?

The logs are piped through papertrail and it emails me when l2met returns status codes other than 200.

BRMatt avatar Aug 26 '13 12:08 BRMatt