spdx-3-model icon indicating copy to clipboard operation
spdx-3-model copied to clipboard

Proposal: change externalId to externalPrefixOrId

Open goneall opened this issue 2 years ago • 2 comments

externalPrefixOrId would have the sematics of the external element ID starting with the externalPrefixOrId. This is compatible with representing the full ID since the full ID would "start with" the entire string.

Changing the semantics of the externalId property used in ExternalMap would dramatically reduce the size of SPDX documents in certain scenarios.

For example, in the 96b_aerocore2-hello_world-build.json, there an external ID for every source file used in the build. In this simple example, there are 95 external IDs for source files. In larger builds, you could have thousands of exernal IDs. In the example, and in all SPDX 2.X documents converted to SPDX 3.0, these files will have a common prefix. This change would reduce the number of ExternalMap entries by a factor of approximately 90 in the example. This also has the benefit of being compatible with the SPDX 2.X spec.

This proposal does have a downside for the consumer in that they can not use a simple hash map to represent the external map. Note that this same downside exists in the SPDX 2.X spec and we have not receieved any negative feedback indicating it may not be too much of a problem.

goneall avatar Nov 04 '23 23:11 goneall

Decision: Leave as is, the solution is compatible so it could be implemented post 3.0 if it does become an issue.

goneall avatar Nov 07 '23 17:11 goneall

I think this might be viable but definitely has its downsides.

We should be careful to understand the following:

  • The chances of producers inadvertently asserting elements as external would go up very significantly with this approach instead of specific ids
  • This really only helps for cases following the historical SPDX approach of basing element ids off of the id for the document they are defined in. In v3, elements are not implicitly tied to any particular document and ids can be anything.
  • The chances of given elements potentially being associated with the wrong definingArtifact is significantly higher. This is again due to the fact that the v3 ids can be anything and elements from different Artifacts could share the same prefix.
  • This shifts a significant amount of processing responsibility to the consumer. In order to determine a list of external elements in the document they would need to process through all element ids in the document and do partial string comparisons against the externalPrefixOrId list.

My subconscious is telling me there are a couple more caveats like this but I have not yet been able to bubble them to the surface. I figured I would offer initial comments now rather than waiting to get any evaluation perfect.

sbarnum avatar Nov 09 '23 15:11 sbarnum