dependency-track BOM Retention Policy

Current Behavior:

Per post in Slack by @stevespringett regarding what happens to BOMs when they are uploaded to Dependency-Track:

The original BOMs are dismissed. When BOMs are uploaded, they are validated, processed, and discarded. When the BOMs are processed, a lot of the data (but not all) available in the BOM is applied to the DT object model. When a user requests a BOM be created for a project... the BOM is dynamically constructed from the DT object model. So it’s not original, but you can get a BOM back.

...and then:

I have been thinking about an option to keep BOMs that have been uploaded along with a retention policy.

Proposed Behavior:

An option to keep BOMs that have been uploaded, along with a retention policy, would be very useful indeed.

First of all, I think that the BOM API endpoint (ie upload) would need to be extended to implement a parameter to tell DT whether or not to keep the uploaded BOM. The default should be "false". ie, Keeping a BOM needs to be a sin of commission not omission.

/v1/bom/cyclonedx/project/{uuid} would need an optional parameter(s) in order to extract a kept BOM.

In my environment, we use Jenkins multi-branch scripted pipeline jobs and perform nightly builds of trunk/master that uploads a BOM. ie, version is alway SNAPSHOT. Something that needs no retention at all. Implementation of this enhancement should allow tweaking of the pipelines such that release builds could generate release BOMs. It is these that we would want to explicitly tell DT to keep.

We have three environments: Integration, Staging, Production. Anything that is released is automatically deployed to Integration. However, not everything that is deployed to Integration gets promoted to Staging, and even fewer versions get promoted from there to Production. This is worth mentioning because it has ramifactions for two or three things in DT:

When a BOM is uploaded you might now you want to keep it but you do not have any idea (yet) of how long you will want to keep it
We thus need a way to tag/label retained BOMs dynamically... and do so at any time after initial upload. This would allow us to indicate what environment a BOM can be "found" in. We already do scripting that extracts all this info from our environments so it would not be too much extra work to update DT if DT allowed such to be done. Note that there is currently no API endpoint for changing tags (but what is needed in this situation might be "something like a tag but not a tag" )
A retained BOM might be in Integration AND Staging AND Production all at the same time.
Should it be possible to upload old BOMs (if you have them to hand) to flesh things out?
Based on all the above, one variant on a retention policy might be (say) "remove anything that over one months old and NOT tagged as being in any environment". One might already be deploying release BOMs to Github or a Maven Repo (etc)... which would allow for strictness in retention policy combined with a safety net.. an "ability to rewind".

Dec 17 '20 00:12 msymons

I would like to see this functionality too. It could enable some interesting scenarios like integration with external analysis tools/services.

For example, configuring a webhook for BOM_CONSUMED that triggers additional analysis by an external service.

Feb 02 '21 21:02 coderpatros

Here's my thoughts... DT already keeps complete metadata about the SBOMs when they're uploaded. It would be simple to add a BLOB column to store the SBOMs as well, along with the ability to view history of all the SBOMs that were published and be able to retrieve them from the BLOB column. Simple, and it would work well with the new ACL model.

@msymons wrote:

First of all, I think that the BOM API endpoint (ie upload) would need to be extended to implement a parameter to tell DT whether or not to keep the uploaded BOM. The default should be "false". ie, Keeping a BOM needs to be a sin of commission not omission.

Not sure I agree with that statement. Why wouldn't you want to keep history for audit purposes? I think this should be a system configuration option with the ability to:

Enable/disable BOM retention capability
Set default value (retain or discard) for SBOMs that are uploaded
Enable/disable the ability (via API) to override the default (if BOM retention is enabled)

In terms of tags/labeling, please create a new ticket once BOM retention is implemented. We need to make incremental improvements and allow people time to use and form opinions of them.

@coderpatros Is there a way in the CycloneDX BOM Repo Server to disable publishing, and yet side-load SBOMs to the repo server? I want the ability to release a project from DT that would put the project in read-only mode, and securely publish the SBOM to BRS. What I'm thinking is shipping BRS as another container with only the ability to retrieve SBOMs from it, but I'll still need a way to publish to BRS.

Aug 03 '21 02:08 stevespringett

@coderpatros per the readme:

The server supports sharing repository storage between multiple frontend instances. Which can be used for full active/active high availability clustering.

When deploying to multiple data centres it is recommended to have one master instance that supports publishing BOMs. And use data replication to any other target data centres used for distributing BOMs.

This might be a possible solution. We could ship with two BRS containers, one which DT publishes to and the other being read-only. However, is there any docs for enabling data replication?

Aug 03 '21 02:08 stevespringett

Is there a way in the CycloneDX BOM Repo Server to disable publishing, and yet side-load SBOMs to the repo server? I want the ability to release a project from DT that would put the project in read-only mode, and securely publish the SBOM to BRS.

Technically you could. But it wouldn't be supported without using the API publishing mechanism from another instance.

is there any docs for enabling data replication?

That is left as an exercise for the reader :) On a serious note, it largely depends on what you want to achieve. i.e. if you were deploying it to Azure and wanted to achieve multi-region high availability I would recommend Azure File Storage with read access geo-redundant storage. That takes care of replication to a secondary data center out of the box.

For this DT use case you could approach it a couple of ways.

One would be to have BRS behind the scenes only. And have DT essentially act as a proxy.

The other would be to have a behind the scenes instance that allows publishing from the DT server only. And a storage volume shared with a front end read only instance that allows public retrieval.

The behind the scenes BRS, with DT as a proxy, would have the benefit of being able to control authentication and authorization in DT.

It might be easier to just implement this in DT. But the benefit of using BRS is that you get BOM content type negotiation and conversion out of the box. As in, you can publish in your preferred format and consumers can request their preferred format. It could also allow you to drop handling of JSON or XML format in DT code and offload that to BRS as well.

Aug 03 '21 07:08 coderpatros

The other benefit of using BRS is that retention of the original BOM can be offloaded to BRS too.

Aug 03 '21 07:08 coderpatros

Punting to a future release waiting for the cdx exchange api to be public and finalized

Mar 23 '22 14:03 stevespringett

Rest assured that a rescheduling for 4.10 is not a bad thing. It's because this functionality will be done *right",

Dec 20 '22 17:12 msymons

My 2c: I think storing the original BOM would be useful (we archive them separately for debugging purposes), but much more useful for our use cases would be if the DT API exposed all of the information in the original BOM.

The point in time at when we want information from the original BOM is when we are processing some vulnerability or policy violation. This is only easily accessible if it's present in the API response from DT when asking about the affected component (or project). Being able to download the original BOM at that point, parse it, and then lookup the information we want is technically possible, but is much more cumbersome than just having direct reliable access to the information in the API response.

Dec 04 '23 12:12 mykter