Performance Improvement Review: Dojo Edit finding takes 50 seconds to complete
Problem description Dojo Edit Finding from UI takes 50 seconds to complete. How can the performance be improved here? The system has 10K engagements with 100K findings
Steps to reproduce Steps to reproduce the behavior:
- Go to Findings
- Click on Finding
- Select a finding
- See time lag
Expected behavior The edit finding is taking a very long time to open
Deployment method (select with an X)
- [X ] Docker Compose
- [ ] Kubernetes
- [ ] GoDojo
Environment information
- Operating System: [e.g. Ubuntu 18.04] - Oracle Linux 7
- DefectDojo version (see footer) or commit message: [use
git show -s --format="[%ci] %h: %s [%d]"] v. 2.29.4 ( release mode )
Logs
Use docker-compose logs (or similar, depending on your deployment method) to get the logs and add the relevant sections here showing the error occurring (if applicable).
In MysQL, below query gets triggered(captured from slow query log)
# Time: 2024-06-03T05:05:11.959378Z
# User@Host: defectdojo[defectdojo] @ defectdojo.host.com [10x.10y.1mn.1qr] Id: 3691498
# Query_time: 3.210302 Lock_time: 0.000005 Rows_sent: 97162 Rows_examined: 194324
SET timestamp=1717391108;
SELECT `dojo_endpoint`.`id`, `dojo_endpoint`.`protocol`, `dojo_endpoint`.`userinfo`, `dojo_endpoint`.`host`, `dojo_endpoint`.`port`, `dojo_endpoint`.`path`, `dojo_endpoint`.`query`, `dojo_endpoint`.`fragment`, `dojo_endpoint`.`product_id` FROM `dojo_endpoint` INNER JOIN `dojo_product` ON (`dojo_endpoint`.`product_id` = `dojo_product`.`id`) WHERE `dojo_endpoint`.`product_id` = 2 ORDER BY `dojo_product`.`name` ASC, `dojo_endpoint`.`host` ASC, `dojo_endpoint`.`protocol` ASC, `dojo_endpoint`.`port` ASC, `dojo_endpoint`.`userinfo` ASC, `dojo_endpoint`.`path` ASC, `dojo_endpoint`.`query` ASC, `dojo_endpoint`.`fragment` ASC;
The uwsgi logs show 50 seconds time to serve the edit finding request
[pid: 1901|app: -|req: -/-] 10.x.y.z ([email protected]) {60 vars in 1600 bytes} [Mon Jun 3 06:11:16 2024] GET /finding/1059966/edit => generated 12856718 bytes in 50006 msecs (HTTP/1.1 200) 8 headers in 372 bytes (25 switches on core 0)
Get all findings results in below query that takes 36 seconds from 25 findings
`SELECT `dojo_finding`.`id`, `dojo_finding`.`title`, `dojo_finding`.`date`, `dojo_finding`.`sla_start_date`, `dojo_finding`.`cwe`, `dojo_finding`.`cve`, `dojo_finding`.`cvssv3`, `dojo_finding`.`cvssv3_score`, `dojo_finding`.`url`, `dojo_finding`.`severity`, `dojo_finding`.`description`, `dojo_finding`.`mitigation`, `dojo_finding`.`impact`, `dojo_finding`.`steps_to_reproduce`, `dojo_finding`.`severity_justification`, `dojo_finding`.`refs`, `dojo_finding`.`test_id`, `dojo_finding`.`active`, `dojo_finding`.`verified`, `dojo_finding`.`false_p`, `dojo_finding`.`duplicate`, `dojo_finding`.`duplicate_finding_id`, `dojo_finding`.`out_of_scope`, `dojo_finding`.`risk_accepted`, `dojo_finding`.`under_review`, `dojo_finding`.`last_status_update`, `dojo_finding`.`review_requested_by_id`, `dojo_finding`.`under_defect_review`, `dojo_finding`.`defect_review_requested_by_id`, `dojo_finding`.`is_mitigated`, `dojo_finding`.`thread_id`, `dojo_finding`.`mitigated`, `dojo_finding`.`mitigated_by_id`, `dojo_finding`.`reporter_id`, `dojo_finding`.`numerical_severity`, `dojo_finding`.`last_reviewed`, `dojo_finding`.`last_reviewed_by_id`, `dojo_finding`.`param`, `dojo_finding`.`payload`, `dojo_finding`.`hash_code`, `dojo_finding`.`line`, `dojo_finding`.`file_path`, `dojo_finding`.`component_name`, `dojo_finding`.`component_version`, `dojo_finding`.`static_finding`, `dojo_finding`.`dynamic_finding`, `dojo_finding`.`created`, `dojo_finding`.`scanner_confidence`, `dojo_finding`.`sonarqube_issue_id`, `dojo_finding`.`unique_id_from_tool`, `dojo_finding`.`vuln_id_from_tool`, `dojo_finding`.`sast_source_object`, `dojo_finding`.`sast_sink_object`, `dojo_finding`.`sast_source_line`, `dojo_finding`.`sast_source_file_path`, `dojo_finding`.`nb_occurences`, `dojo_finding`.`publish_date`, `dojo_finding`.`service`, `dojo_finding`.`planned_remediation_date`, `dojo_finding`.`planned_remediation_version`, `dojo_finding`.`effort_for_fixing`, COUNT(CASE WHEN `dojo_endpoint_status`.`mitigated` = 0 THEN `dojo_endpoint_status`.`id` ELSE NULL END) AS `active_endpoint_count`, COUNT(CASE WHEN `dojo_endpoint_status`.`mitigated` = 1 THEN `dojo_endpoint_status`.`id` ELSE NULL END) AS `mitigated_endpoint_count` FROM `dojo_finding` LEFT OUTER JOIN `dojo_endpoint_status` ON (`dojo_finding`.`id` = `dojo_endpoint_status`.`finding_id`) GROUP BY `dojo_finding`.`id` ORDER BY `dojo_finding`.`numerical_severity` ASC LIMIT 25;`
25 rows in set (35.40 sec)
The corresponding uwsgi logs shows 42 seconds
[pid: 1897|app: -|req: -/-] 10.x.y.z ([email protected]) {60 vars in 1568 bytes} [Mon Jun 3 06:20:54 2024] GET /finding => generated 3207729 bytes in 41669 msecs (HTTP/1.1 200) 9 headers in 516 bytes (12 switches on core 0
Sample scan files If applicable, add sample scan files to help reproduce your problem.
Screenshots If applicable, add screenshots to help explain your problem.
Additional context (optional) Add any other context about the problem here.
+1 we are also facing the same issue.
Couple of things to point out here:
- You don't mention if the DB is done as a container on the same host of a separate host. Moving the DB off the host/VM that the application (and other stuff) runs on will help with performance. A decent example of doing that is here.
- You don't say what resources you have for the install. Obviously more compute (CPU/RAM) always helps
- There's been some pretty significant performance improvements in recent versions so updating to latest is also a best practice.
- MySQL will no longer be a supported DB for DefectDojo as of the July release (2.36.0) and will be removed in the Dev branch this month. See my discussion of deprecations - https://github.com/DefectDojo/django-DefectDojo/discussions/9690
- MYSQL DB is being run on a different VM
- The VM hosting Dojo has plenty of resources - 16 CPU, 100 GB memory
- We will try the latest version
- What would be the migration plan of existing Dojo data from MySQL to another DB. Is there a verified process that can be followed?
Look at the quarterly update in my previous comment for a link to a GH discussion on migrating to Postgres.
I haven't run MySQL in 8+ years so not in a place where I can provide advice. There's a very nice DB migration tool in that discussion thread that looks like it should 'just work'.
As always, do backups first.
I wasn't able to get the migration script to work without some tweaks. I plan on sharing those tweaks after I finish testing.
@37b
I wasn't able to get the migration script to work without some tweaks. I plan on sharing those tweaks after I finish testing.
Sharing that would be greatly appreciated. If possible, please add it to this discussion on migrating from MySQL to Postgres at https://github.com/DefectDojo/django-DefectDojo/discussions/9480
That discussion also mentions the pgloader tool - I've not tried it but it seems like something very useful.
@mtesauro Kindly reopen the discussion https://github.com/DefectDojo/django-DefectDojo/discussions/9480
@navzen2000
@mtesauro Kindly reopen the discussion #9480
Done - though people have been commenting in there with it closed so I didn't think that was a blocker.
+1 we are also facing the same issue. (PostreSQL - separate host)
FWIW, I can say I'm NOT seeing this issue. DefectDojo 2.38.1 (compose) with Postgres on a separate host. Edit finding is sub-second response time.
I also tested the public demo server which is using Postgres as a container as part of the docker compose running on the same small VM. Those render times are also sub-second:
And that demo server is running on a 2 CPU / 4 GB RAM single VM so it's not like there's loads of resources behind that install. If you run cat /proc/cpuinfo it shows it's running 2 of Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz so nothing to brag about really.
Maybe give PGTune a try? https://pgtune.leopard.in.ua/
How many Findings do you have on the server? We currently have a total of over 841791 finds.
Slowest individual queries
SELECT "dojo_finding"."id", "dojo_finding"."title", "dojo_finding"."date", "dojo_finding"."sla_start_date", "dojo_finding"."sla_expiration_date", "dojo_finding"."cwe", "dojo_finding"."cve", "dojo_finding"."epss_score", "dojo_finding"."epss_percentile", "dojo_finding"."cvssv3", "dojo_finding"."cvssv3_score", "dojo_finding"."url", "dojo_finding"."severity", "dojo_finding"."description", "dojo_finding"."mitigation", "dojo_finding"."impact", "dojo_finding"."steps_to_reproduce", "dojo_finding"."severity_justification", "dojo_finding"."refs", "dojo_finding"."test_id", "dojo_finding"."active", "dojo_finding"."verified", "dojo_finding"."false_p", "dojo_finding"."duplicate", "dojo_finding"."duplicate_finding_id", "dojo_finding"."out_of_scope", "dojo_finding"."risk_accepted", "dojo_finding"."under_review", "dojo_finding"."last_status_update", "dojo_finding"."review_requested_by_id", "dojo_finding"."under_defect_review", "dojo_finding"."defect_review_requested_by_id", "dojo_finding"."is_mitigated", "dojo_finding"."thread_id", "dojo_finding"."mitigated", "dojo_finding"."mitigated_by_id", "dojo_finding"."reporter_id", "dojo_finding"."numerical_severity", "dojo_finding"."last_reviewed", "dojo_finding"."last_reviewed_by_id", "dojo_finding"."param", "dojo_finding"."payload", "dojo_finding"."hash_code", "dojo_finding"."line", "dojo_finding"."file_path", "dojo_finding"."component_name", "dojo_finding"."component_version", "dojo_finding"."static_finding", "dojo_finding"."dynamic_finding", "dojo_finding"."created", "dojo_finding"."scanner_confidence", "dojo_finding"."sonarqube_issue_id", "dojo_finding"."unique_id_from_tool", "dojo_finding"."vuln_id_from_tool", "dojo_finding"."sast_source_object", "dojo_finding"."sast_sink_object", "dojo_finding"."sast_source_line", "dojo_finding"."sast_source_file_path", "dojo_finding"."nb_occurences", "dojo_finding"."publish_date", "dojo_finding"."service", "dojo_finding"."planned_remediation_date", "dojo_finding"."planned_remediation_version", "dojo_finding"."effort_for_fixing", COUNT("dojo_endpoint_status"."id") FILTER (WHERE NOT "dojo_endpoint_status"."mitigated") AS "active_endpoint_count", COUNT("dojo_endpoint_status"."id") FILTER (WHERE "dojo_endpoint_status"."mitigated") AS "mitigated_endpoint_count" FROM "dojo_finding" LEFT OUTER JOIN "dojo_endpoint_status" ON ("dojo_finding"."id" = "dojo_endpoint_status"."finding_id") WHERE "dojo_finding"."active" GROUP BY "dojo_finding"."id" ORDER BY "dojo_finding"."numerical_severity" ASC LIMIT 25;
"ORDER BY" in the given selects is the problem why the query is so slow. Without "ORDER BY" the select is solved within a few ms
Rest api is also slow as UI
HW: K8S 4x workers 64GB RAM, 32vCPU celery-beat: replicas: 2 resources: requests: cpu: 2000m memory: 3Gi limits: cpu: 4000m memory: 8Gi
celery-worker:
replicas: 2
resources:
requests:
cpu: 2000m
memory: 2Gi
limits:
cpu: 8000m
memory: 8Gi
app_settings:
# pool_type: solo
pool_type: prefork
autoscale_min: 30
autoscale_max: 60
concurrency: 40
prefetch_multiplier: 128
defectdojo-django: replicas: 2 resources: requests: cpu: 3000m memory: 6Gi limits: cpu: 8000m memory: 12Gi app_settings: processes: 32 threads: 32
Same issue here (recently updated to the last release). 3 CPU + 8GB RAM. PostgreSQL in a different VM.
Similar issue here (latest 2.46.0), opening a finding takes about 5 - 8 seconds. 46483 Findings total.
In 2.49.0 most queries displaying lists of things have been rewritten to use correlated subqueries instead of counts. We've seen considerable performance improvements in larger instances. Could everyone try again with the latest version of defect dojo so we can see what issues remain? In the past we've seen slowness in instances with lots of engagements or tests as in some places these are all loaded into dropdown boxes in the filter sections in the UI. Switching to "string based matching" in system settings will help there.
I'd like to work on this issue.
My approach: Refactor the Product_Tab class in dojo/utils.py to use @cached_property
for lazy loading instead of executing all database queries eagerly in __init__.
This will defer expensive COUNT queries until they're actually accessed.
Let me know if this approach sounds reasonable.
I think the PR is an improvement, but I would be surprised if that is/was the culprit. I have a 120k finding instance which has normal performance. Did you see those queries take up a long time? @Vincent-Ngobeh