dependency-track icon indicating copy to clipboard operation
dependency-track copied to clipboard

High CPU load caused by commonly executed queries

Open gbennett-squarespace opened this issue 8 months ago • 4 comments

Current Behavior

All the profiling/data is being done via GCP Query Insights, as I've been trying to track down causes for high CPU utilization in our environment. Apologies for the length. :) I'm going to truncate some query details (the individual columns) as this is quite long.

High CPU utilization is being seen by this query:

SELECT

FROM
  "DEPENDENCYMETRICS" "A0"
WHERE
  "A0"."COMPONENT_ID" = $1

The current indexes do not seem to cover this particular use case, and this query seems to be quite aggressively used. For our environment: Image

We are running in GCP on CloudSQL, currently on Postgres 17 but we also experienced this on 15 as well.

Adding an index to cover this particular query drastically reduces CPU load, you can see the exact point when the additional index was added (no changes were made to any of the other indexes): Image

I believe this is being (partially) caused by this particular query often finding no results (as would be often expected), but because it doesn't have a particularly close match on the index, the cost of it is quite high.

Here's the query plan before adding the additional index: Image

And after the index: Image

These are also the next two worst offenders (at least in our environment): Image

First one:

SELECT

FROM
  "COMPONENT" "A0"
WHERE
  "A0"."PROJECT_ID" = $1

The second (and a bunch of others) that all fall under this index also have high utilization.

When I zoom out on metrics and look at our total utilization (about 24 hours), these are the biggest causes: Image

Some are simple:

SELECT

FROM
  "PROJECTMETRICS" "A0"
WHERE
  "A0"."PROJECT_ID" = $1

Most are complex queries, but most of them hit the same indexes.

Steps to Reproduce

1.No steps to reproduce as this is being seen in a production environment.

Expected Behavior

Adding strategic indexes for queries with high cost reduces time it takes to run the query lowering load on the DB and improving user experience,

Dependency-Track Version

4.13.0

Dependency-Track Distribution

Container Image

Database Server

PostgreSQL

Database Server Version

17_4.R20250302.00_07

Browser

N/A

Checklist

gbennett-squarespace avatar Apr 24 '25 00:04 gbennett-squarespace

We also see permanent CPU load, is this issue and MR still alive?

Image it switches between idle and select.

Sometimes we have huge disk io spikes, and sometimes they don't need disk at all.

stiwa-wrre avatar Oct 31 '25 11:10 stiwa-wrre

@nscuro did you see any benefits with the indices?

Because we even got notifications from our storage Team that deptrack is getting crazy. But we only have 2181 Projects in deptrack. The latest 4.13.6 Update didn't brought any improvements. We still have idle postgres processes with very high cpu load.

The query in question is this one:

SELECT 'org.dependencytrack.model.VulnerableSoftware' AS "DN_TYPE","A0"."CPE22","A0"."CPE23","A0"."EDITION","A0"."ID","A0"."LANGUAGE","A0"."OTHER","A0"."PART","A0"."PRODUCT","A0"."PURL","A0"."PURL_NAME","A0"."PURL_NAMESPACE","A0"."PURL_QUALIFIERS","A0"."PURL_SUBPATH","A0"."PURL_TYPE","A0"."PURL_VERSION","A0"."SWEDITION","A0"."TARGETHW","A0"."TARGETSW","A0"."UPDATE","A0"."UUID","A0"."VENDOR","A0"."VERSION","A0"."VERSIONENDEXCLUDING","A0"."VERSIONENDINCLUDING","A0"."VERSIONSTARTEXCLUDING","A0"."VERSIONSTARTINCLUDING","A0"."VULNERABLE" FROM "VULNERABLESOFTWARE" "A0" WHERE "A0"."PURL_TYPE" = $1 AND "A0"."PURL_NAME" = $2 AND "A0"."PURL_NAMESPACE" = $3 AND "A0"."VERSION" = $4 AND "A0"."VERSIONENDEXCLUDING" IS NULL AND "A0"."VERSIONENDINCLUDING" IS NULL AND "A0"."VERSIONSTARTEXCLUDING" IS NULL AND "A0"."VERSIONSTARTINCLUDING" IS NULL FETCH NEXT ROW ONLY

It runs in a loop. When its done, it instantly spawns again.

Image Image

stiwa-wrre avatar Nov 17 '25 14:11 stiwa-wrre

@stiwa-wrre None of the proposed indexes are for that table.

Would it be possible for you to run the query you shared with EXPLAIN ANALYZE and share the result?

nscuro avatar Nov 17 '25 15:11 nscuro

@nscuro The explain:

                                                                                                     QUERY PLAN                                                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..66533.73 rows=1 width=2402) (actual time=4712.947..4712.949 rows=0 loops=1)
   ->  Index Scan using "VULNERABLESOFTWARE_PURL_TYPE_NS_NAME_IDX" on "VULNERABLESOFTWARE" "A0"  (cost=0.56..798398.60 rows=12 width=2402) (actual time=4712.944..4712.945 rows=0 loops=1)
         Index Cond: ((("PURL_TYPE")::text = 'deb'::text) AND (("PURL_NAMESPACE")::text = 'ubuntu'::text) AND (("PURL_NAME")::text = 'linux-aws'::text))
         Filter: (("VERSIONENDEXCLUDING" IS NULL) AND ("VERSIONENDINCLUDING" IS NULL) AND ("VERSIONSTARTEXCLUDING" IS NULL) AND ("VERSIONSTARTINCLUDING" IS NULL) AND (("VERSION")::text = '4.4.0-1149.155'::text))
         Rows Removed by Filter: 670010
 Planning Time: 0.911 ms
 Execution Time: 4713.001 ms
(7 rows)

ok we got a little bit deeper with glowroot now: these are the queries:

jdbc query: SELECT 'org.dependencytrack.model.VulnerableSoftware' AS "DN_TYPE","A0"."CPE22","A0"."CPE23","A0"."EDITION","A0"."ID","A0"."LANGUAGE","A0"."OTHER","A0"."PART","A0"."PRODUCT","A0"."PURL","A0"."PURL_NAME","A0"."PURL_NAMESPACE","A0"."PURL_QUALIFIERS","A0"."PURL_SUBPATH","A0"."PURL_TYPE","A0"."PURL_VERSION","A0"."SWEDITION","A0"."TARGETHW","A0"."TARGETSW","A0"."UPDATE","A0"."UUID","A0"."VENDOR","A0"."VERSION","A0"."VERSIONENDEXCLUDING","A0"."VERSIONENDINCLUDING","A0"."VERSIONSTARTEXCLUDING","A0"."VERSIONSTARTINCLUDING","A0"."VULNERABLE" FROM "VULNERABLESOFTWARE" "A0" WHERE "A0"."PURL_TYPE" = ? AND "A0"."PURL_NAME" = ? AND "A0"."PURL_NAMESPACE" = ? AND "A0"."VERSION" = ? AND "A0"."VERSIONENDEXCLUDING" IS NULL AND "A0"."VERSIONENDINCLUDING" IS NULL AND "A0"."VERSIONSTARTEXCLUDING" IS NULL AND "A0"."VERSIONSTARTINCLUDING" IS NULL FETCH NEXT ROW ONLY  ['deb', 'linux-aws', 'ubuntu', '5.4.0-1151.161'] => 0 rows
Image

We saw that OSV was causing the queries in the Stacktrace. See the Trace File below.

trace-20251117-154139-100.zip

This is also explains the issue that the query doesn't get finished in 12h, then osv triggers another round and so on. after some days we are full with running the same stuff all the time. You can see it in the graph. The query should finish way faster and no concurrent query should start when the other is running.

We disabled OSV for now and restarted the apiserver, db queries stopped and cpu is chilled now.

stiwa-wrre avatar Nov 17 '25 15:11 stiwa-wrre