tools-python icon indicating copy to clipboard operation
tools-python copied to clipboard

Inconsistent validation in writing and reading SPDX files; inconsistent examples

Open mvhensbergen opened this issue 3 years ago • 2 comments

In the examples directory, there are the following two scripts:

  • write_tv.py
  • pp_tv.py

Running

python3 write_tv.py output

generates a spdx file without problem. Reading this file again:

python3 pp_tv.py output output2
Invalid DocumentNamespace value spdx, must contain a scheme (e.g. "https:") and should not contain the "#" delimiter, line:5
Errors encountered while parsing

The given line is:

DocumentNamespace: spdx

It seems that the writing of this value passed parsing in the writer, but reading the same value gives a mismatch.

mvhensbergen avatar Apr 12 '22 06:04 mvhensbergen

I have spent some time in the code to see what is going on, and here is my analysis.

Writing In write_tv.py, a Document object is created and subsequently written to a file using write_document from from spdx.writers.tagvalue. This function has validate=True as default.

This implies that validate_namespace in document.py is called, but this only checks if there is a value, but no validation on the value.

The document is then written to disk without errors. If it would have given errors, an exception handler in write_tv would have triggered.

Reading When reading the file, it is being validated against validators from validations.py, in particular function validate_doc_namespace.

This function does not only check if the value is present, but also checks if the value starts with a certain value.

Reading the outputfile of write_tv fails, because write_tv allows the invalid value 'spdx' being written in the file, which is not accepted by the stringent check in validations.py.

Suggestion With my limited view on the project, I would say that it should be impossible to write a SPDX file with write_document that is not parsable by the projects own parser. Perhaps it would be a good idea to have the validation in document.py reuse the more stringent validators in validations.py so that there is only one set of validation function to maintain.

mvhensbergen avatar Apr 18 '22 11:04 mvhensbergen

Thanks for your issue! I had a look at it and came to the same conclusion. In my opinion there are two ways to fix this issue:

  • Easy fix: Adapt the write_tv.py so that a valid document with a valid namespace is generated.

  • A somewhat more 'consistent fix': As you already proposed-use the same validation for parsing and writing files. I am not sure why the validation was implemented in these two different ways in the first step. Maybe @pombredanne or @licquia can help here. What do you think about changing the code so that the validation of documents is also done with the methods from validations.py?

meretp avatar Oct 14 '22 14:10 meretp

The new refactored version now found on main features a completely separate validation layer that can be used for consistent validation. Additionally, all invalid examples have been removed.

armintaenzertng avatar Mar 30 '23 09:03 armintaenzertng