prometheus.ex icon indicating copy to clipboard operation
prometheus.ex copied to clipboard

Prometheus.Metric breaks when `:application_controller` is busy

Open joladev opened this issue 4 years ago • 4 comments

There can be other scenarios for why :application_controller would be busy, but the one that I've seen is where you're draining connections while shutting down using a library like https://hexdocs.pm/plug_cowboy/Plug.Cowboy.Drainer.html. Without connection draining when receiving a SIGTERM the endpoint will shut down immediately, killing any current connections. With connection draining listeners on the port are suspended, meaning no more connections are opened, but allow the existing connections to drain, and then (and only then) proceed with shutting down the endpoint.

This means :application_controller asks the application containing the endpoint to shut down and waits for it to be done. While it's waiting, it's completely blocked and can't respond to messages. Depending on how long your draining timeout is, this can be a long time. Prometheus.Metric uses Application.started_applications which sends a message to :application_controller and waits timeout (5 seconds) for a response. While draining connections this will always fail, causing Prometheus.Metric to blow up (this also means Prometheus.PlugExporter blows up when Prometheus tries to scrape). If it's helpful I can set up a repo that reproduces this.

Is it possible to avoid calling Application.started_applications? Or catching the failure?

I may also be missing something, but why is the on_load being called each time a request hits the Prometheus.PlugExporter?

joladev avatar Dec 16 '20 13:12 joladev

Good questions, will look soon.

deadtrickster avatar Jan 11 '21 14:01 deadtrickster

Hey! Any updates on this? ❤️

joladev avatar Jan 22 '21 11:01 joladev

Also running into this issue

sorliem avatar Mar 21 '23 14:03 sorliem

lol, forgot about this one

ikavgo avatar Mar 28 '23 09:03 ikavgo