hyades icon indicating copy to clipboard operation
hyades copied to clipboard

Decouple the concept of project from project versions

Open nscuro opened this issue 8 months ago • 7 comments

Current Behavior

Currently, multiple versions of the same project are represented as separate, independent records in the PROJECT table.

The concept of a project version is treated very loosely: the fact that two PROJECT records represent different versions of the same project is merely implied by both sharing the same NAME column.

This poses a challenge for features such as portfolio ACL, where it becomes increasingly impractical to explicitly grant access to individual project versions.

Proposed Behavior

Decouple project from project version. Enable features such as portfolio ACL to apply to projects, rather than individual project version.

Most of what is currently associated with a project would switch to being associated with a project version instead. A rough sketch:

erDiagram
    project {
        bigint id pk
        text name
    }

    project_version {
        bigint id pk
        bigint project_id fk
        text name
    }

    component {
        bigint id pk
        bigint project_version_id fk
        text name
        text version
        text purl
    }

    team {
        bigint id pk
        text name
    }

    project_version |o--o{ component: "contains"
    project_version }|--|| project: "is version of"
    project }o--o{ team: "is accessible by"

Checklist

nscuro avatar Mar 31 '25 21:03 nscuro

FYI @jhoward-lm @ashearin. Needs more fleshing out obviously but raised it for now so we have it documented. Any opinions, input, or requests as to how this should look and behave is much appreciated.

nscuro avatar Mar 31 '25 21:03 nscuro

A few questions:

  • in the proposed project_version, should the name column be version?
  • would project_version have its own UUID column/field?
  • how would the relationship with other tables change? e.g. ANALYSIS, BOM, DEPENDENCYMETRICS, etc.
    • I imagine this would be on a case-by-case basis to determine whether the relationship only makes sense for a specific version of a project or for the version-agnostic project as a whole, but in general I agree with your assessment in the issue description

Unless I'm mistaken, it seems like there should only be one entry for a particular version on a given project. Assuming version was the intended column name, would you be opposed to either:

  • making project_version's primary key a composite of the project_id foreign key and version
  • enforcing a unique index/constraint on project_id and version (if the auto-incrementing primary key id column is needed to satisfy a DataNucleus requirement or something)
---
title: Proposed project_version table options
---
erDiagram
    "project_version (composite PK)" {
        bigint project_id PK, FK
        text version PK
    }

    "project_version (unique constraint)" {
        bigint id PK
        bigint project_id UK, FK
        text version UK
    }

Without some kind of unique constraint, it seems like it could end up with situations like

id project_id version
1 1 v0.1.0
2 1 v0.1.0

jhoward-lm avatar Apr 04 '25 03:04 jhoward-lm

in the proposed project_version, should the name column be version?

It felt odd at first to name a column in project_version, version. But not too opinionated on this so naming it version is fine with me.

would project_version have its own UUID column/field?

Without having done deep investigation, I am anticipating most fields and logic of the current PROJECT table to transition to PROJECT_VERSION. If that turns out to be true, dropping the UUID would require a lot of refactoring since significant parts of the code are currently accepting and passing around project UUIDs. This includes REST API endpoints.

Generally I agree with your implied suggestion in that the name+version pair in itself should be plenty to uniquely identify a project version, though. As usual, not opposed to dropping the UUID, but need to investigate what'd be the impact.

how would the relationship with other tables change? e.g. ANALYSIS, BOM, DEPENDENCYMETRICS, etc.

You answered this yourself IMO, not much to add at this point.

[...] it seems like there should only be one entry for a particular version on a given project

Correct.

For reference, here's how that's currently dealt with on the schema-level: https://github.com/DependencyTrack/hyades-apiserver/blob/c14fbce0667aa476a61b715a6ac79b1653a6f76c/src/main/resources/migration/changelog-v5.4.0.xml#L17-L30

making project_version's primary key a composite of the project_id foreign key and version

This would be ideal IMO, but sadly a pain to make work with DataNucleus: https://www.datanucleus.org/products/accessplatform_6_0/jdo/mapping.html#application_identity

Would need to experiment with this, but my past attempt to make something similar work were not successful.

enforcing a unique index/constraint on project_id and version

This would basically be what we have now, and what I linked above. This is known to work, although it might not be the most ideal solution.

nscuro avatar Apr 04 '25 08:04 nscuro

Sounds good to me! To clarify, I wasn't advocating for dropping the UUID from anywhere, rather asking if the new project_version table would have its own UUID in addition. No real opinion one way or the other, just another way of retrieving from the database with a single column index.

It's also not unheard of to straight up use a UUID as the primary key either:

the latter of which I designed the schemas for. The UUID is generated deterministically based on the hash of the protobuf message being stored, enables content-addressable storage

jhoward-lm avatar Apr 04 '25 22:04 jhoward-lm

It's also not unheard of to straight up use a UUID as the primary key either

Oh yeah for sure. We've started adopting UUIDv7 for new entities, and there is other pending work that will use it heavily.

A downside being that any UUID other than UUIDv4 must be pre-computed in the application. Support for UUIDv7 generation is coming in Postgres 18. I've not yet ran into a situation where this is a problem though.

nscuro avatar Apr 05 '25 14:04 nscuro

It felt odd at first to name a column in project_version, version. But not too opinionated on this so naming it version is fine with me.

Just my two cents, not taking a position on the old "singular vs plural" debate...as a table holding multiple instances of project versions, to me project_versions (plural) describes its intent a little better. Would a column named version make more sense in this case?

jhoward-lm avatar May 05 '25 13:05 jhoward-lm

Having the table name be plural would certainly help disambiguate this particular case.

I have no strong opinions on singular vs plural for tables names either, but I do think consistency is important. So unless we make all table names plural at some point, it's best to not stray from the existing pattern here.

nscuro avatar May 05 '25 14:05 nscuro