Convert `DIRECT_DEPENDENCIES` columns to `JSONB`
Current Behavior
The DIRECT_DEPENDENCIES column in the PROJECT and COLUMN tables are currently of type TEXT. For the COMPONENT table, there is a GIN_TRGM_OPS index on the column:
https://github.com/DependencyTrack/hyades/blob/96af03d19b004a2c41763dabbb7d082180c99e3c/commons-persistence/src/main/resources/schema.sql#L809-L810
In order to perform dependency graph traversal, queries are using LIKE '%...%' conditions such as:
"COMPONENT"."DIRECT_DEPENDENCIES" LIKE '%48d6c78d-5099-48c8-9bab-2038a5f177e4%'
While this works, there are more efficient ways to achieve the desired outcome in PostgreSQL.
Proposed Behavior
Convert the DIRECT_DEPENDENCIES column in the PROJECT and COMPONENT tables to JSONB.
Drop the existing COMPONENT_DIRECT_DEPENDENCIES_GIN_IDX index, and create a new one using JSONB_PATH_OPS, for example:
CREATE INDEX "COMPONENT_DIRECT_DEPENDENCIES_JSONB_IDX" ON "COMPONENT"
USING GIN("DIRECT_DEPENDENCIES" JSONB_PATH_OPS);
It will then be possible to leverage that index using the JSONB contains operator:
SELECT "NAME"
FROM "COMPONENT"
WHERE "DIRECT_DEPENDENCIES" @> '[{"uuid": "92efe3bd-808d-413a-993e-689c1f35713a"}]'::jsonb;
[!WARNING] To make DataNucleus work with JSONB columns,
insert-functionandupdate-functionextensions must be configured: https://github.com/DependencyTrack/hyades/issues/1074#issuecomment-2198219236
[!WARNING] There might be multiple queries across the code base that perform
LIKEqueries on theDIRECT_DEPENDENCIEScolumn. Some may use JDOQL (i.e.directDependencies.contains("...")). In particular the latter is problematic, since JDOQL doesn't support the@>JSON operator. Track those queries down, and investigate whether they can be refactored accordingly.
Demo
[!NOTE] The test database contains 1000 projects and TODO components. Generated testdata BOMs were used to populate the database.
Query plan using existing TEXT column:
+--------------------------------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
|Bitmap Heap Scan on "COMPONENT" (cost=621.77..665.18 rows=11 width=14) (actual time=7.739..17.003 rows=3 loops=1) |
| Recheck Cond: ("DIRECT_DEPENDENCIES" ~~ '%ed521419-f3ac-4331-ac93-7219b5420133%'::text) |
| Rows Removed by Index Recheck: 14 |
| Heap Blocks: exact=17 |
| -> Bitmap Index Scan on "COMPONENT_DIRECT_DEPENDENCIES_GIN_IDX" (cost=0.00..621.77 rows=11 width=0) (actual time=7.551..7.551 rows=17 loops=1)|
| Index Cond: ("DIRECT_DEPENDENCIES" ~~ '%ed521419-f3ac-4331-ac93-7219b5420133%'::text) |
|Planning Time: 0.427 ms |
|Execution Time: 17.213 ms |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
Size of index COMPONENT_DIRECT_DEPENDENCIES_GIN_IDX:
SELECT pg_size_pretty(pg_relation_size(indexrelid)) "Index Size"
FROM pg_stat_all_indexes i
INNER JOIN pg_class c
ON i.relid=c.oid
WHERE i.relname='COMPONENT'
AND indexrelname = 'COMPONENT_DIRECT_DEPENDENCIES_GIN_IDX';
106 MB
Query plan using JSONB column:
+----------------------------------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
|Bitmap Heap Scan on "COMPONENT" (cost=41.29..10042.07 rows=3823 width=14) (actual time=0.130..0.135 rows=3 loops=1) |
| Recheck Cond: ("DIRECT_DEPENDENCIES" @> '[{"uuid": "ed521419-f3ac-4331-ac93-7219b5420133"}]'::jsonb) |
| Heap Blocks: exact=3 |
| -> Bitmap Index Scan on "COMPONENT_DIRECT_DEPENDENCIES_JSONB_IDX" (cost=0.00..40.33 rows=3823 width=0) (actual time=0.010..0.010 rows=3 loops=1)|
| Index Cond: ("DIRECT_DEPENDENCIES" @> '[{"uuid": "ed521419-f3ac-4331-ac93-7219b5420133"}]'::jsonb) |
|Planning Time: 0.100 ms |
|Execution Time: 0.163 ms |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
Size of index COMPONENT_DIRECT_DEPENDENCIES_JSONB_IDX:
SELECT pg_size_pretty(pg_relation_size(indexrelid)) "Index Size"
FROM pg_stat_all_indexes i
INNER JOIN pg_class c
ON i.relid=c.oid
WHERE i.relname='COMPONENT'
AND indexrelname = 'COMPONENT_DIRECT_DEPENDENCIES_JSONB_IDX';
21 MB
Checklist
- [X] I have read and understand the contributing guidelines
- [X] I have checked the existing issues for whether this enhancement was already requested
For JDOQL queries, we could create a DataNucleus extension to support the @> operator: https://www.datanucleus.org/products/accessplatform_6_0/extensions/extensions.html#rdbms_sql_method
DataNucleus plugin for supporting the @> operator in JDOQL: https://github.com/nscuro/datanucleus-postgresql
Adds a jsonbContains method to the String type, can be used like this:
final var person = new Person();
person.setProperties(/* language=JSON */ """
[
{
"foo": "bar",
"baz": 111
}
]
""");
pm.makePersistent(person);
final Query<Person> query = pm.newQuery(Person.class);
query.setFilter("properties.jsonbContains(:foo)");
query.setParameters("[{\"baz\":111}]");
final Person queryResult = query.executeUnique();
assertThat(queryResult).isNotNull();