django-DefectDojo icon indicating copy to clipboard operation
django-DefectDojo copied to clipboard

Bad performace. Engagement page takes ~50 s when tests have many findings — replace DISTINCT counts with Subquery counters in `prefetch_for_view_tests`

Open DenysMoskalenko opened this issue 6 months ago • 2 comments

Bug description

Opening the “View Engagement” page for an engagement that contains tests with ≈ 20 000 findings and > 1 000 re-imports takes ~50 s.
prefetch_for_view_tests() counts findings with Count(..., distinct=True) after joining dojo_finding and dojo_test_import, which explodes to ~25 M rows before grouping.


Steps to reproduce

  1. Import or create an engagement that has a test with ~20 000 findings (e.g. large SARIF or Gitleaks scan) and many re-imports.
  2. Navigate to Product → Engagements → View Engagement (/engagement/<id>).
  3. Observe 45–50 s page load; PostgreSQL shows heavy temp-file usage.

Expected behavior

The page should render in under a second regardless of findings volume.


Deployment method

  • [X] Docker Compose
  • [x] Kubernetes
  • [ ] GoDojo

Environment information

  • OS / container base : Debian 12 (official images)
  • Docker compose : v2.24.4
  • DefectDojo commit : [2025-06-08] 6e5a3d0 (current dev)
  • PostgreSQL : 14

Proposed fix

Replace the six Count(..., distinct=True) annotations in
dojo/views/engagement/views.py::prefetch_for_view_tests with index-only Subquery counters so each is evaluated once per test_id.
Execution time on the same dataset drops from ≈ 50 s → ≈ 200 ms.

from django.db.models import (
    Count, IntegerField, OuterRef, Subquery, Value, QuerySet
)
from django.db.models.functions import Coalesce
from dojo.models import Finding, Test_Import   # adjust import path if needed

def _count_subquery(qs):
    """Return a Subquery that yields one aggregated count per test_id."""
    return Subquery(
        model_qs.values("test_id").annotate(c=Count("*")).values("c")[:1],  # one row per test_id
        output_field=IntegerField(),
    )

def prefetch_for_view_tests(tests: QuerySet) -> QuerySet:
    if not isinstance(tests, QuerySet):
        logger.warning("unable to prefetch because query was already executed")
        return tests

    qs = tests.select_related("lead", "test_type").prefetch_related("tags", "notes")

    base_findings = Finding.objects.filter(test_id=OuterRef("pk"))
    qs = qs.annotate(
        count_findings_test_all=Coalesce(_count_subquery(base_findings), Value(0)),
        count_findings_test_active=Coalesce(_count_subquery(base_findings.filter(active=True)), Value(0)),
        count_findings_test_active_verified=Coalesce(
            _count_subquery(base_findings.filter(active=True, verified=True)), Value(0)
        ),
        count_findings_test_mitigated=Coalesce(_count_subquery(base_findings.filter(is_mitigated=True)), Value(0)),
        count_findings_test_dups=Coalesce(_count_subquery(base_findings.filter(duplicate=True)), Value(0)),
        total_reimport_count=Coalesce(
            _count_subquery(Test_Import.objects.filter(test_id=OuterRef("pk"), type=Test_Import.REIMPORT_TYPE)),
            Value(0),
        ),
    )
    return qs

Sample scan files

Not required; any SARIF or Gitleaks scan with ≈ 20 000 findings reproduces the issue.

Additional context

  • I’m happy to open a PR with the patch.

DenysMoskalenko avatar Jun 09 '25 16:06 DenysMoskalenko

@DenysMoskalenko Thank, a PR would be very welcome. I remember from long ago that we occasionally ran into these scenario's where multiple joins combined with counts resulted in poor performance. There might be other places where a similar improvement can be made!

We may want to take a little time to evaluate/review the PR, could you base it against dev?

valentijnscholten avatar Jun 09 '25 16:06 valentijnscholten

I found a few more places where we have a problem, I'll open PRs for those too.

DenysMoskalenko avatar Jun 10 '25 16:06 DenysMoskalenko

FIxed in 2.48.0

valentijnscholten avatar Jul 07 '25 18:07 valentijnscholten