scancode-toolkit
scancode-toolkit copied to clipboard
Analysis exception processing SPDX file: Missing document namespace
Description
I have used the openSUSE OBS to RPM build the GNU "hello world" example. Using this RPM as my input, I would like to create an SBOM for it. When I go to verify the SBOM, then it says the document namespace is missing. I don't see a scancode command line option to specify it. I do see that document namespace was a bug a couple of years ago and was fixed.
How To Reproduce
wget https://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz wget https://download.opensuse.org/repositories/home:/geokraft4/15.3/src/hello-2.10-4.2.src.rpm wget https://download.opensuse.org/repositories/home:/geokraft4/15.3/x86_64/hello-2.10-4.2.x86_64.rpm scancode -clpeui -n 1 --verbose --json-pp ~/spdxdocs/hello-rpm-spdx.json ~/spdxhello/hello-2.10-4.2.x86_64.rpm java -jar ~/spdxtools/tools-java-1.0.4-jar-with-dependencies.jar Verify ~/spdxdocs/hello-rpm-spdx.json
System configuration
host: SLES 15 SP3 scancode 30.1.0 installed from git clone.
mkdir ~/source cd ~/source git clone https://github.com/nexB/scancode-toolkit.git cd scancode-toolkit git checkout develop ./configure source venv/bin/activate sudo zypper install -y python3-devel sudo pip install --upgrade pip setuptools wheel sudo pip install scancode-toolkit which scancode
Example hello-rpm-spdx.json file hello-rpm-spdx.json.txt
The file you generated from scancode is a json file as you've used the --json-pp option.
Refer to https://scancode-toolkit.readthedocs.io/en/latest/cli-reference/output-format.html#all-scan-output-options
where the spdx options are: --spdx-rdf or --spdx-tv.
I could generate a file using the following command:
scancode -clpeui -n 1 --verbose --spdx-rdf hello-rpm-spdx.rdf ./hello-2.10-4.2.x86_64.rpm
Now this is a spdx output.
But still when I run spdx verify with the following command:
java -jar ./tools-java-1.0.4/tools-java-1.0.4-jar-with-dependencies.jar Verify ./hello-rpm-spdx.rdf
I get the following:
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'log4j2.debug' to show Log4j2 internal initialization logging.
21:27:08.021 [main] ERROR org.apache.jena.rdf.model.impl.RDFDefaultErrorHandler - (line 7 column 73): {E211} Base URI is null, but there are relative URIs to resolve.: <SPDX Document created by ScanCode Toolkit>
21:27:08.035 [main] ERROR org.apache.jena.rdf.model.impl.RDFDefaultErrorHandler - (line 42 column 50): {E211} Base URI is null, but there are relative URIs to resolve.: <SPDXRef-001>
This SPDX Document is not valid due to:
Missing required document name
Document must have at least one relationship of type DOCUMENT_DESCRIBES
Btw, I ran the verify command for spdx tools java versions:
- 1.0.0
- 1.0.4
- 1.1.0
This not valid statement was present for all the versions.
We need to fix this such that we produce valid spdx documents.
The RPM is now at https://download.opensuse.org/repositories/home:/geokraft4/15.3/x86_64/hello-2.10-4.3.x86_64.rpm and the tag value passes the SPDX validation correctly... but IMHO its content is not right. We should reuse the SPDX output module from Scancode.io (or inspire from it at https://github.com/nexB/scancode.io/blob/main/scanpipe/spdx/ )
Moving to 32.1 milestone for now.