Decouple the concept of project from project versions
Current Behavior
Currently, multiple versions of the same project are represented as separate, independent records in the PROJECT table.
The concept of a project version is treated very loosely: the fact that two PROJECT records represent different versions of the same project is merely implied by both sharing the same NAME column.
This poses a challenge for features such as portfolio ACL, where it becomes increasingly impractical to explicitly grant access to individual project versions.
Proposed Behavior
Decouple project from project version. Enable features such as portfolio ACL to apply to projects, rather than individual project version.
Most of what is currently associated with a project would switch to being associated with a project version instead. A rough sketch:
erDiagram
project {
bigint id pk
text name
}
project_version {
bigint id pk
bigint project_id fk
text name
}
component {
bigint id pk
bigint project_version_id fk
text name
text version
text purl
}
team {
bigint id pk
text name
}
project_version |o--o{ component: "contains"
project_version }|--|| project: "is version of"
project }o--o{ team: "is accessible by"
Checklist
- [x] I have read and understand the contributing guidelines
- [x] I have checked the existing issues for whether this enhancement was already requested
FYI @jhoward-lm @ashearin. Needs more fleshing out obviously but raised it for now so we have it documented. Any opinions, input, or requests as to how this should look and behave is much appreciated.
A few questions:
- in the proposed
project_version, should thenamecolumn beversion? - would
project_versionhave its ownUUIDcolumn/field? - how would the relationship with other tables change? e.g.
ANALYSIS,BOM,DEPENDENCYMETRICS, etc.- I imagine this would be on a case-by-case basis to determine whether the relationship only makes sense for a specific version of a project or for the version-agnostic project as a whole, but in general I agree with your assessment in the issue description
Unless I'm mistaken, it seems like there should only be one entry for a particular version on a given project. Assuming version was the intended column name, would you be opposed to either:
- making
project_version's primary key a composite of theproject_idforeign key andversion - enforcing a unique index/constraint on
project_idandversion(if the auto-incrementing primary keyidcolumn is needed to satisfy a DataNucleus requirement or something)
---
title: Proposed project_version table options
---
erDiagram
"project_version (composite PK)" {
bigint project_id PK, FK
text version PK
}
"project_version (unique constraint)" {
bigint id PK
bigint project_id UK, FK
text version UK
}
Without some kind of unique constraint, it seems like it could end up with situations like
| id | project_id | version |
|---|---|---|
| 1 | 1 | v0.1.0 |
| 2 | 1 | v0.1.0 |
in the proposed
project_version, should thenamecolumn beversion?
It felt odd at first to name a column in project_version, version. But not too opinionated on this so naming it version is fine with me.
would project_version have its own UUID column/field?
Without having done deep investigation, I am anticipating most fields and logic of the current PROJECT table to transition to PROJECT_VERSION. If that turns out to be true, dropping the UUID would require a lot of refactoring since significant parts of the code are currently accepting and passing around project UUIDs. This includes REST API endpoints.
Generally I agree with your implied suggestion in that the name+version pair in itself should be plenty to uniquely identify a project version, though. As usual, not opposed to dropping the UUID, but need to investigate what'd be the impact.
how would the relationship with other tables change? e.g.
ANALYSIS,BOM,DEPENDENCYMETRICS, etc.
You answered this yourself IMO, not much to add at this point.
[...] it seems like there should only be one entry for a particular version on a given project
Correct.
For reference, here's how that's currently dealt with on the schema-level: https://github.com/DependencyTrack/hyades-apiserver/blob/c14fbce0667aa476a61b715a6ac79b1653a6f76c/src/main/resources/migration/changelog-v5.4.0.xml#L17-L30
making
project_version's primary key a composite of theproject_idforeign key andversion
This would be ideal IMO, but sadly a pain to make work with DataNucleus: https://www.datanucleus.org/products/accessplatform_6_0/jdo/mapping.html#application_identity
Would need to experiment with this, but my past attempt to make something similar work were not successful.
enforcing a unique index/constraint on
project_idandversion
This would basically be what we have now, and what I linked above. This is known to work, although it might not be the most ideal solution.
Sounds good to me! To clarify, I wasn't advocating for dropping the UUID from anywhere, rather asking if the new project_version table would have its own UUID in addition. No real opinion one way or the other, just another way of retrieving from the database with a single column index.
It's also not unheard of to straight up use a UUID as the primary key either:
the latter of which I designed the schemas for. The UUID is generated deterministically based on the hash of the protobuf message being stored, enables content-addressable storage
It's also not unheard of to straight up use a UUID as the primary key either
Oh yeah for sure. We've started adopting UUIDv7 for new entities, and there is other pending work that will use it heavily.
A downside being that any UUID other than UUIDv4 must be pre-computed in the application. Support for UUIDv7 generation is coming in Postgres 18. I've not yet ran into a situation where this is a problem though.
It felt odd at first to name a column in project_version, version. But not too opinionated on this so naming it version is fine with me.
Just my two cents, not taking a position on the old "singular vs plural" debate...as a table holding multiple instances of project versions, to me project_versions (plural) describes its intent a little better. Would a column named version make more sense in this case?
Having the table name be plural would certainly help disambiguate this particular case.
I have no strong opinions on singular vs plural for tables names either, but I do think consistency is important. So unless we make all table names plural at some point, it's best to not stray from the existing pattern here.