django-prometheus icon indicating copy to clipboard operation
django-prometheus copied to clipboard

PrometheusEndpointServer throws an exception, after which the endpoint is not available and not restart

Open NitroLine opened this issue 1 year ago • 0 comments

I have django app on uvicorn. I use PROMETHEUS_METRICS_EXPORT_PORT_RANGE=range(8001, 8011) to start metrics on each uvicorn worker. It works fine.

But after some netowork error on server, some workers print execption:

Exception occurred during processing of request from ('106.75.72.22', 52046)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/local/lib/python3.10/socketserver.py", line 347, in process_request
    self.finish_request(request, client_address)
  File "/usr/local/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/local/lib/python3.10/socketserver.py", line 747, in __init__
    self.handle()
  File "/usr/local/lib/python3.10/http/server.py", line 433, in handle
    self.handle_one_request()
  File "/usr/local/lib/python3.10/http/server.py", line 421, in handle_one_request
    method()
  File "/usr/local/lib/python3.10/site-packages/prometheus_client/exposition.py", line 276, in do_GET
    self.wfile.write(output)
  File "/usr/local/lib/python3.10/socketserver.py", line 826, in write
    self._sock.sendall(b)
BrokenPipeError: [Errno 32] Broken pipe

Or

Exception occurred during processing of request from ('162.142.125.223', 51380)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/local/lib/python3.10/socketserver.py", line 347, in process_request
    self.finish_request(request, client_address)
  File "/usr/local/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/local/lib/python3.10/socketserver.py", line 747, in __init__
    self.handle()
  File "/usr/local/lib/python3.10/http/server.py", line 433, in handle
    self.handle_one_request()
  File "/usr/local/lib/python3.10/http/server.py", line 421, in handle_one_request
    method()
  File "/usr/local/lib/python3.10/site-packages/prometheus_client/exposition.py", line 276, in do_GET
    self.wfile.write(output)
  File "/usr/local/lib/python3.10/socketserver.py", line 826, in write
    self._sock.sendall(b)
ConnectionResetError: [Errno 104] Connection reset by peer

And some targets in prometheus start show error context deadline exceeded. (I saw such Traceback four times in logs, and four targets are down now)

So I think the PrometheusEndpointServer process has crashed and won't restart, I'm losing some metrics because of that. It would be cool if the exporter server automatically restarted if it became unavailable.

NitroLine avatar Sep 05 '23 09:09 NitroLine