django-prometheus
django-prometheus copied to clipboard
PrometheusEndpointServer throws an exception, after which the endpoint is not available and not restart
I have django app on uvicorn. I use PROMETHEUS_METRICS_EXPORT_PORT_RANGE=range(8001, 8011) to start metrics on each uvicorn worker. It works fine.
But after some netowork error on server, some workers print execption:
Exception occurred during processing of request from ('106.75.72.22', 52046)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/local/lib/python3.10/socketserver.py", line 347, in process_request
self.finish_request(request, client_address)
File "/usr/local/lib/python3.10/socketserver.py", line 360, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/local/lib/python3.10/socketserver.py", line 747, in __init__
self.handle()
File "/usr/local/lib/python3.10/http/server.py", line 433, in handle
self.handle_one_request()
File "/usr/local/lib/python3.10/http/server.py", line 421, in handle_one_request
method()
File "/usr/local/lib/python3.10/site-packages/prometheus_client/exposition.py", line 276, in do_GET
self.wfile.write(output)
File "/usr/local/lib/python3.10/socketserver.py", line 826, in write
self._sock.sendall(b)
BrokenPipeError: [Errno 32] Broken pipe
Or
Exception occurred during processing of request from ('162.142.125.223', 51380)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/local/lib/python3.10/socketserver.py", line 347, in process_request
self.finish_request(request, client_address)
File "/usr/local/lib/python3.10/socketserver.py", line 360, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/local/lib/python3.10/socketserver.py", line 747, in __init__
self.handle()
File "/usr/local/lib/python3.10/http/server.py", line 433, in handle
self.handle_one_request()
File "/usr/local/lib/python3.10/http/server.py", line 421, in handle_one_request
method()
File "/usr/local/lib/python3.10/site-packages/prometheus_client/exposition.py", line 276, in do_GET
self.wfile.write(output)
File "/usr/local/lib/python3.10/socketserver.py", line 826, in write
self._sock.sendall(b)
ConnectionResetError: [Errno 104] Connection reset by peer
And some targets in prometheus start show error context deadline exceeded
.
(I saw such Traceback four times in logs, and four targets are down now)
So I think the PrometheusEndpointServer process has crashed and won't restart, I'm losing some metrics because of that. It would be cool if the exporter server automatically restarted if it became unavailable.