nautobot-app-device-lifecycle-mgmt icon indicating copy to clipboard operation
nautobot-app-device-lifecycle-mgmt copied to clipboard

The application makes the `/metrics` endpoint slow with a large number of devices

Open ubajze opened this issue 1 year ago • 2 comments

Environment

  • Python version: 3.11.7
  • Nautobot version: 2.1.2
  • nautobot-device-lifecycle-mgmt version: 2.0.3

Expected Behavior

I want to load the page relatively quickly, not block uwsgi for a significant amount of time.

Observed Behavior

I have Prometheus configured to scrape the Nautobot /metrics endpoint every minute. It takes about ~30s to load the page with ~1500 devices. This request blocks uwsgi, eventually making K8s liveness and readiness checks fail and K8s restart the pods.

I performed the analysis with the following piece of code (thx @Kircheneer ):

from django.test import RequestFactory
from nautobot.core.views import nautobot_metrics_view
import cProfile
import pstats

factory = RequestFactory()
request = factory.get("/metrics")
request.user = User.objects.get(username="some-poor-fellow")
with cProfile.Profile() as pr:
    response = nautobot_metrics_view(request)
    stats = pstats.Stats(pr).sort_stats(pstats.SortKey.CUMULATIVE)
    stats.print_stats()

I noticed that this function is called 3 times, each taking ~12s:

        3    0.042    0.014   36.174   12.058 /usr/local/lib/python3.11/site-packages/nautobot_device_lifecycle_mgmt/metrics.py:115(metrics_lcm_hw_end_of_support)

Steps to Reproduce

  1. Add 1500 devices to Nautobot
  2. Go to the /metrics endpoint.

ubajze avatar Feb 06 '24 08:02 ubajze

Do we know why the function is called 3 times? I would expect framework to call the function only once each time we fetch /metrics.

progala avatar Feb 19 '24 14:02 progala

@ubajze @Kircheneer I'm having trouble replicating this. I got a local instance with 10,000 devices and this is what I get when running the profiling code:

3	0.0002541	8.47e-05	0.02821	0.009404	metrics.py:115(metrics_lcm_hw_end_of_support)

image

Can you tell me how many inventory items you have and how many HardwareLCM objects?

progala avatar Feb 23 '24 11:02 progala

DLM metrics will be disabled by default in the new versions of DLM. Operators will be able to selectively enable metrics one-by-one, if desired.

progala avatar Sep 02 '24 15:09 progala