prometheus.ex
prometheus.ex copied to clipboard
Prometheus.Metric breaks when `:application_controller` is busy
There can be other scenarios for why :application_controller would be busy, but the one that I've seen is where you're draining connections while shutting down using a library like https://hexdocs.pm/plug_cowboy/Plug.Cowboy.Drainer.html. Without connection draining when receiving a SIGTERM the endpoint will shut down immediately, killing any current connections. With connection draining listeners on the port are suspended, meaning no more connections are opened, but allow the existing connections to drain, and then (and only then) proceed with shutting down the endpoint.
This means :application_controller asks the application containing the endpoint to shut down and waits for it to be done. While it's waiting, it's completely blocked and can't respond to messages. Depending on how long your draining timeout is, this can be a long time. Prometheus.Metric uses Application.started_applications which sends a message to :application_controller and waits timeout (5 seconds) for a response. While draining connections this will always fail, causing Prometheus.Metric to blow up (this also means Prometheus.PlugExporter blows up when Prometheus tries to scrape). If it's helpful I can set up a repo that reproduces this.
Is it possible to avoid calling Application.started_applications? Or catching the failure?
I may also be missing something, but why is the on_load being called each time a request hits the Prometheus.PlugExporter?
Good questions, will look soon.
Hey! Any updates on this? ❤️
Also running into this issue
lol, forgot about this one