tern icon indicating copy to clipboard operation
tern copied to clipboard

Add SPDX generation using spdx-tools

Open armintaenzertng opened this issue 2 years ago • 19 comments

This is set up to produce the same SPDX output as the current spdx generation module while utilising the spdx-tools library. The goal is to replace the current module with this new one, which will allow easy migration to more SPDX formats as well as SPDXv3. I tried to stay close to the structure of the original implementation.

I tested this using the following commands: tern report -i golang:1.12-alpine -f spdxjson -o spdx_test.json and tern report -i golang:1.12-alpine -f spdxjson_new -o new_spdx_test.json I compared the resulting json files using jd, treating arrays as unordered sets. The only differences were in timestamps, UUIDs, and two differences in how json output is generated:

  • The enum value in spdx-tools is PACKAGE_MANAGER, not PACKAGE-MANAGER
  • The optional key documentDescribes has been deprecated and is therefore not used by the spdx-tools. Only the corresponding DESCRIBES relationship is serialized.

armintaenzertng avatar Jun 23 '23 06:06 armintaenzertng

I added support for YAML, XML and RDF-XML formats.

armintaenzertng avatar Jun 23 '23 12:06 armintaenzertng

@armintaenzertng Thank you very much for all your work on this! How does one denote which version of SPDX documents they want using this PR? I am assuming this PR, by default, generates SPDX 2.3 documents. However, we can't drop support for SPDX 2.2 since we have users who want it because it is the ISO standard version. There needs to be a way to denote SPDX version to generate on the command line before we can merge this.

rnjudge avatar Jul 10 '23 21:07 rnjudge

This PR currently replicates the behavior of the current state. That is, SPDX 2.2 is hardcoded into the output. I will gladly add support for multiple SPDX versions, but we should clarify how this would work in the current workflow.

  • Should there be an optional parameter that denotes the SPDX version (that gets ignored if no SPDX output is required)? If yes, how do we call that?
  • Or will we add two (and more in the future) parameters for the -f flag, like spdxjson2.2, spdxjson2.3, spdxjson3.0 etc.? This is probably easier to implement, but will lead to much code duplication, probably not the best idea.

armintaenzertng avatar Jul 11 '23 07:07 armintaenzertng

After some further consideration and having a deeper look at the code, point 2 from above might be the better alternative after all. I'll try to implement that.

armintaenzertng avatar Jul 11 '23 13:07 armintaenzertng

I added the versioning I described above. @rnjudge, please have a look if that is OK with you. :)

armintaenzertng avatar Jul 11 '23 15:07 armintaenzertng

@armintaenzertng do you want to schedule a zoom call about this? I would like to avoid mass code duplication as that was the whole point of using the SPDX tools library.

rnjudge avatar Jul 11 '23 16:07 rnjudge

Yes, certainly! :) Do you have my email address?

armintaenzertng avatar Jul 11 '23 16:07 armintaenzertng

I added a CLI version parameter for the output format. Formats that don't support this (i.e. everything except for SPDX) will raise an error if this is set.

armintaenzertng avatar Jul 13 '23 14:07 armintaenzertng

I added SPDX-2.3 functionality.

In particular, this means that if the SPDX version is 2.3, we set the primary_package_purpose of the container package to CONTAINER and omit concluded license, declared license and copyright text in SpdxPackages if possible.

armintaenzertng avatar Jul 17 '23 07:07 armintaenzertng

@rnjudge: tern report -i photon:3.0 -f spdxjson -sv 2.3 -o output.json followed by pyspdxtools -i output.json yields no errors or invalidations, so I'll mark this "Ready for review" now. There are still some open issues regarding Spdx documents with file information, which I collected in #1240. As these existed in the old SPDX implementation already, I'd propose to fix them in a separate PR after this one to keep things clearer (this one is already quite large).

armintaenzertng avatar Jul 20 '23 09:07 armintaenzertng

This would fix https://github.com/tern-tools/tern/issues/1211

vargenau avatar Jul 20 '23 10:07 vargenau

@rnjudge @armintaenzertng For specifying the SPDX version of the output, I would propose the following:

tern report -f [email protected] -i golang:1.12-alpine -o alpine.spdx

and

tern report -f [email protected] -i golang:1.12-alpine -o alpine.spdx.json

Advantages would be that we have one less argument and it is similar to the Syft syntax: https://github.com/anchore/syft/

spdx-tag-value: A tag-value formatted report conforming to the SPDX 2.3 specification
[email protected]: A tag-value formatted report conforming to the SPDX 2.2 specification
spdx-json: A JSON report conforming to the SPDX 2.3 JSON Schema
[email protected]: A JSON report conforming to the SPDX 2.2 JSON Schema

That is what I had prototyped in https://github.com/tern-tools/tern/pull/1228

What do you think?

vargenau avatar Jul 20 '23 10:07 vargenau

@vargenau: I believe the current implementation suggestion is a little more flexible in supporting more versions/formats in the future. Your proposed solution would result in five additional entrypoints (one per spdx format) per version.

Still, if necessary, the [email protected] usecase can be easily implemented by adding a new entrypoint with the generator just calling get_spdx_from_image_list with the right parameters.

armintaenzertng avatar Jul 26 '23 09:07 armintaenzertng

Hi @armintaenzertng My proposal aim was:

  • to have one less option
  • to have a syntax similar to Syft If it is more complex to implement, no problem.

vargenau avatar Jul 26 '23 10:07 vargenau

@rnjudge: tern report -i photon:3.0 -f spdxjson -sv 2.3 -o output.json followed by pyspdxtools -i output.json yields no errors or invalidations, so I'll mark this "Ready for review" now. There are still some open issues regarding Spdx documents with file information, which I collected in #1240. As these existed in the old SPDX implementation already, I'd propose to fix them in a separate PR after this one to keep things clearer (this one is already quite large).

Is is really -sv 2.3 ? In Rose message above, it was parser_report.add_argument('-fv', '--format-version',

vargenau avatar Jul 26 '23 10:07 vargenau

I meant -sv, this was changed according to this comment.

armintaenzertng avatar Jul 26 '23 10:07 armintaenzertng

I meant -sv, this was changed according to this comment.

Thank you, I had missed that comment.

vargenau avatar Jul 26 '23 11:07 vargenau

Small update: I changed the spdx-tools dependency to the new 0.8.1 version.

armintaenzertng avatar Aug 24 '23 13:08 armintaenzertng

small fix: I changed the check for the SPDX version to use the official string "SPDX-2.2" consistently.

armintaenzertng avatar Aug 29 '23 07:08 armintaenzertng