Bad performace. Engagement page takes ~50 s when tests have many findings — replace DISTINCT counts with Subquery counters in `prefetch_for_view_tests`
Bug description
Opening the “View Engagement” page for an engagement that contains tests with ≈ 20 000 findings and > 1 000 re-imports takes ~50 s.
prefetch_for_view_tests() counts findings with Count(..., distinct=True) after joining dojo_finding and dojo_test_import, which explodes to ~25 M rows before grouping.
Steps to reproduce
- Import or create an engagement that has a test with ~20 000 findings (e.g. large SARIF or Gitleaks scan) and many re-imports.
- Navigate to Product → Engagements → View Engagement (
/engagement/<id>). - Observe 45–50 s page load; PostgreSQL shows heavy temp-file usage.
Expected behavior
The page should render in under a second regardless of findings volume.
Deployment method
- [X] Docker Compose
- [x] Kubernetes
- [ ] GoDojo
Environment information
- OS / container base : Debian 12 (official images)
- Docker compose :
v2.24.4 - DefectDojo commit :
[2025-06-08] 6e5a3d0(currentdev) - PostgreSQL : 14
Proposed fix
Replace the six Count(..., distinct=True) annotations in
dojo/views/engagement/views.py::prefetch_for_view_tests with index-only Subquery counters so each is evaluated once per test_id.
Execution time on the same dataset drops from ≈ 50 s → ≈ 200 ms.
from django.db.models import (
Count, IntegerField, OuterRef, Subquery, Value, QuerySet
)
from django.db.models.functions import Coalesce
from dojo.models import Finding, Test_Import # adjust import path if needed
def _count_subquery(qs):
"""Return a Subquery that yields one aggregated count per test_id."""
return Subquery(
model_qs.values("test_id").annotate(c=Count("*")).values("c")[:1], # one row per test_id
output_field=IntegerField(),
)
def prefetch_for_view_tests(tests: QuerySet) -> QuerySet:
if not isinstance(tests, QuerySet):
logger.warning("unable to prefetch because query was already executed")
return tests
qs = tests.select_related("lead", "test_type").prefetch_related("tags", "notes")
base_findings = Finding.objects.filter(test_id=OuterRef("pk"))
qs = qs.annotate(
count_findings_test_all=Coalesce(_count_subquery(base_findings), Value(0)),
count_findings_test_active=Coalesce(_count_subquery(base_findings.filter(active=True)), Value(0)),
count_findings_test_active_verified=Coalesce(
_count_subquery(base_findings.filter(active=True, verified=True)), Value(0)
),
count_findings_test_mitigated=Coalesce(_count_subquery(base_findings.filter(is_mitigated=True)), Value(0)),
count_findings_test_dups=Coalesce(_count_subquery(base_findings.filter(duplicate=True)), Value(0)),
total_reimport_count=Coalesce(
_count_subquery(Test_Import.objects.filter(test_id=OuterRef("pk"), type=Test_Import.REIMPORT_TYPE)),
Value(0),
),
)
return qs
Sample scan files
Not required; any SARIF or Gitleaks scan with ≈ 20 000 findings reproduces the issue.
Additional context
- I’m happy to open a PR with the patch.
@DenysMoskalenko Thank, a PR would be very welcome. I remember from long ago that we occasionally ran into these scenario's where multiple joins combined with counts resulted in poor performance. There might be other places where a similar improvement can be made!
We may want to take a little time to evaluate/review the PR, could you base it against dev?
I found a few more places where we have a problem, I'll open PRs for those too.
FIxed in 2.48.0