Make bom-ref use relative paths for workspace items as well
Similar to PURLs
Or just move away from using the filesystem paths in them for local dependencies. Technically the only requirement for them is to be unique; a hash will also work, although the human readability will suffer considerably.
Keeping them somewhat human-readable would be great, yes.
What if we replace "(path+file...)" on local dependencies with either a hash, a counting suffix or drop it all together based on a command line option? The name and version can remain leaving it mostly human-readable.
If the drop it altogether option is chosen, the user should be sure there aren't multiple dependencies with the same name and version on the system. Is this common?
When the repository is tracked in (private) source control, it it possible to set bom-ref to the VCS URL?
Also, how should we handle the case of subordinate packages? Say I've got app foo built in Tauri, which creates a Node project at the root of the repository, and a Rust project in subfolder src-tauri; in that case, citing a raw git URL for the Tauri portion is going to be incorrect.
When the repository is tracked in (private) source control, it it possible to set bom-ref to the VCS URL?
I would also be interested in the response here. Also an extension of this for purls. As far as I can see Cargo Metadata exports crates.io or nothing. Perhaps the problem lies in not using cargo to maintain concurrent versioning, which doesn't work for our CI approach.
PURLs do already include the VCS URL for dependencies from git; if you want to recover it, that's what you should be looking at, not bom-ref.
bom-ref is an opaque string according to the spec, so the only concerns for it are (1) being unique within the BOM and (2) being reproducible if we want reproducible SBOMs. There's an optional (3): being somewhat human-readable.
I guess we just ran into this one too: https://github.com/trustification/trustify/pull/546#issuecomment-2228572260 … the concern being, that we'd leak build machine information. Which seems true:
"bom-ref": "path+file:///Users/bob/repos/trustification/trustify/modules/importer#[email protected]",
I am ok contributing a change, if there's consent in which direction it should change :)
One way to deal with this could be to have some --hashed-ids and hash all IDs (bom-ref) if the user requests that. Another alternative could be to use the local (relative) path. Which still might leak some info.
Panic messages in the executable itself also leak this information. strings target/release/importer | grep '/Users/bob/repos/trustification/' will reveal it just as well. The build path was never secret to begin with. If anything, I'd rather surface this information prominently so that people understand that, instead of giving a false impression of secrecy.
And even if we scrub the paths from IDs, they will still be included in other places, e.g. dependencies from the local filesystem. Do we scrub those as well? Knowing that your build included /Users/bob/team_shared/company_patched_serde rather than /Users/bob/experiments/serde seems valuable.
I don't think hashes are a good idea. However, I would accept a PR that optionally makes IDs reproducible, with say a --reproducible CLI flag, and also tackles other low-hanging fruit such as the UUID and creation date. We have heard that reproducible SBOMs are desirable, and this will also conceal the filesystem paths as a side effect.
In cargo auditable I just sort packages and use the indices as unique IDs. That could be one possible solution. (Although I haven't tested what happens in the fallback case where IDs are being compared after the ID format changed in v1.77 - the order might still depend on filesystem paths in that case. But I guess that's fine since order dependency on local paths would only happen for local dependencies outside the workspace, in which case we would like to record them anyway.)
I think there's a difference between a panic, and serializing that information into an sbom that's intended to be distributed.
I don't think hashes are a good idea. However, I would accept a PR that optionally makes IDs reproducible, […]
Wouldn't hashes of the current ID also be producible? I wouldn't see a difference there. That should be stable enough?
The ID currently includes filesystem path, so no. The SBOM should be possible to reproduce when run from a different filesystem path and/or on another machine. Therefore it should be independent from the filesystem paths.
The SBOM should be possible to reproduce when run from a different filesystem path and/or on another machine.
That's the information I was missing, In that case, yes. That's not good enough.