dependency-track
dependency-track copied to clipboard
It's not possible to import components with PURLs with more than 255 characters
Current Behavior:
Due to the chosen database schema, Dependency-Track is not able to import components with package URLs that have a length of 255 characters or more. See https://github.com/CycloneDX/cyclonedx-node-npm/issues/224
Steps to Reproduce:
Upload a BOM that contains a component with a long (>255 characters) PURL.
Expected Behavior:
Dependency-Track should be able to import components with PURLs of arbitrary length.
Environment:
- Dependency-Track Version: 4.6.1
- Distribution: N/A
- BOM Format & Version: N/A
- Database Server: N/A
- Browser: N/A
Additional Details:
The "COMPONENT"."PURL"
column is indexed. Migrating it from VARCHAR(255)
to MEDIUMTEXT
, TEXT
or similar data types may have a strong impact on query performance.
I've seen the same thing happen on our end.
Thankfully, the cyclonedx-npm tool supports a --short-PURLs argument which shortens the URLs at the cost of losing some information. It's a stopgap until this is fixed.
I have the same issue with the DESCRIPTION
field of a project. Dependency-Track returns HTTP_500
on POST and PUT /api/v1/project
, whenever the project's description exceed 255 characters.
Would it be useful if we trim the filepath, checksum, repository etc from the purl before saving it to database. And keep the full purl in the details field of the component, for example, saying that this is the original purl. We could also put any hashes found in the hashes field of the component etc
Current bug as per testing in Postgres
The index defined on component's purl (COMPONENT_PURL_IDX) is not used in any query using purl match (either equals or LIKE). Screenshots proving the same :
Tested solution
We can remove max length from purl and make it VARCHAR with length 1024.
@Column(name = "PURL", jdbcType = "VARCHAR", length = 1024)
To improve query performance, we can create index on method directly in database (cannot be done via annotation) using either :
-
GIN index to use search using LIKE %purlString%
create index "COMPONENT_PURL_IDX" on "COMPONENT" using gin (lower("PURL"::text) gin_trgm_ops);
-
BTREE index to use search using LIKE purlString% or equals comparison
create index "COMPONENT_PURL_IDX" on "COMPONENT" (lower("PURL"::text) text_pattern_ops);
Execution time of same query without using method index : ~40s Execution time of same query with using method index : ~10s
Screenshot proving use of index :
Any update ?
FWIW, if you're running Postgres, you can simply ALTER
the column definitions from VARCHAR(255)
to VARCHAR(somelargenumber)
or even TEXT
(which is unlimited length). Postgres stores VARCHAR(X)
and TEXT
the same way, converting between them is a seamless operation.
We have yet to find a good way to address this in the code, such that it works for all supported databases, and does not break any indexes.
Just altered all PURL*
columns to TEXT
and the problem persist
It seems to be a check done by datanucleus
not a database exception
The size
needs to be altered on purl string in the model. Otherwise, input validation will likely fail. These are the places that I know of that will need to change. There may be others.
- https://github.com/DependencyTrack/dependency-track/blob/4.8.2/src/main/java/org/dependencytrack/model/Component.java#L242
- https://github.com/DependencyTrack/dependency-track/blob/4.8.2/src/main/java/org/dependencytrack/model/Component.java#L249
- https://github.com/DependencyTrack/dependency-track/blob/4.8.2/src/main/java/org/dependencytrack/model/Project.java#L179
There are actually quite a few things other than just the PURL that are currently limited to 255 characters in the database and/or in the input parsing. See this query for places where the @Size
attribute is used.
Whilst this issue has now been assigned to 5.x milestone, it could make it into v4.x if someone were to submit a PR. Just as long as it works across all RDBMSs.
We can remove the @Size
annotations from PURL fields, which means that users can manually change their schema to accommodate for larger values. As mentioned above, for PostgreSQL it won't be a problem.
But to my knowledge MySQL requires a key length when creating indexes on text columns, and MSSQL also has some special behavior when indexing unlimited text. Because the ORM we still use does not allow us to specify specific indexes to use, per default it will create indexes that are plain useless on at least 2 of the supported RDBMSes. So the problem remains that we can't offer a general solution without breaking things for certain RDBMSes.
Facing the same issue.
My fix is to modify SBOM purl with python. maybe useful for some one:
from packageurl import PackageURL purl = PackageURL.from_string(value) purl.qualifiers['download_url'] = '' some_dict[key] = PackageURL.to_string(purl)
this make this: pkg:npm/%40babel/[email protected]?download_url=https%3A//myrepomanager/path/%40babel/plugin-transform-react-display-name/-/plugin-transform-react-display-name-7.23.3.tgz#packages/babel-plugin-transform-react-display-name
->
pkg:npm/%40babel/[email protected]#packages/babel-plugin-transform-react-display-name
and worked fine
I had another look at this:
- MySQL has an index key length limit of 192 when using unicode collation
- MSSQL < 2016 has an index key length limit of 900, both for clustered and non-clustered indexes (https://dba.stackexchange.com/a/208019)
- MSSQL >= 2016 has an index key length limit of 1700 for non-clustered indexes
- PostgreSQL and H2 have no index key length limit at all
- All this would not be an issue if the ORM would allow us to specify an index key length based on which RDBMS we're running on, but it doesn't (https://github.com/DependencyTrack/dependency-track/issues/271#issuecomment-1250131006)
For MySQL, the above means that even the current database schema is incompatible when using Unicode collation. Even a PURL column defined as VARCHAR(255)
exceeds the limit of 192. It is already the case that manual changes are required to make DT work with MySQL, see: https://github.com/DependencyTrack/dependency-track/issues/271#issuecomment-1108923693
Given the MSSQL constraint, what we can do is we can increase the PURL length limit from 255 to 786, which is less than the lowest index key length limit of 900. This means that everyone except MySQL users will automatically benefit from this.
786 characters should be plenty. We can re-asses the situation if that still causes issues.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.