tools-python icon indicating copy to clipboard operation
tools-python copied to clipboard

Update to full SPDX v2.1

Open yash-nisar opened this issue 7 years ago • 5 comments

Update the SPDX Python libraries to the SPDX 2.1 specification. The SPDX 2.1 specification is a major upgrade from SPDX 1.2 supporting relationships between SPDX documents and SPDX elements.

yash-nisar avatar Mar 14 '18 09:03 yash-nisar

From @pombredanne : We should drop support for 1.2 alright and yes focus on new 2.x things. As far as I am concerned, I would be quite happy with a complete refactoring of the lib internals such that:

  1. tag/value is the primary target
  2. the internal model is NOT based on RDF data structures but for instance models based on attrs or cattrs, e.g. something pythonic and not tied to RDF.
  3. RDF would only be an adapter of sorts feeding in that model and an "after thought" rather than being central to the data model.

e.g. RDF/XML is not something I care too much for. more like a nice to have for me.

If a refactoring could be done to cut the internal model ties from RDF that would be a big win (and a big work too).

yash-nisar avatar Mar 18 '18 17:03 yash-nisar

@pombredanne @sschuberth @rtgdk It would be great if I could have your inputs on a few things :

Should I focus on the tag/value builders and parsers for the upgrade to spec 2.1 and keep the rdf part under the optional section ?

Also, since our primary focus is on the tagvalue, we might face problems when while conversions in each of the formats (and also while executing validation functions in individual classes)

I went through the docs of attrs, which will help us to get rid of the boilerplate code in classes that we have to define all the time.

We have separate files for builders and parsers for the rdf and tv formats. Can I have a brief idea of how do we want the libraries to be refactored for independence from each other ?

IIUC, this line -> if (not self.error) and (not self.document.validate(validation_messages)): in both the parsers (tv and rdf) will call the validate(...) function as defined in class Document(...). Now, since our primary focus is to take care of the tag/value files, there might be some fields in tag/value which are mandatory and we haven't included them in our rdf model. This will trigger the validate(..) function to return False when when it is called via the rdf parser. So, this is where we're facing problems since we have a common validate API tied to both the tag/value and rdf models. We could maybe pass additional fields like: validate(..., rdf=False, tv=True) and then handle things accordingly ? This would be a not so elegant solution, but will somewhat clutter our codebase with if...else statements.

yash-nisar avatar Mar 18 '18 17:03 yash-nisar

Hi @yash-nisar. I'm sorry, but I'm going to pull myself out of this project as I personally have to need for it anymore, and I'm not a real Python guy anyway. As a result, I will probably not respond to issues or pull requests.

sschuberth avatar Mar 19 '18 13:03 sschuberth

No problem :). Thank you for your reviews.

On Mon, Mar 19, 2018 at 6:42 PM, Sebastian Schuberth < [email protected]> wrote:

Hi @yash-nisar https://github.com/yash-nisar. I'm sorry, but I'm going to pull myself out of this project as I personally have to need for it anymore, and I'm not a real Python guy anyway. As a result, I will probably not respond to issues or pull requests.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/spdx/tools-python/issues/47#issuecomment-374206642, or mute the thread https://github.com/notifications/unsubscribe-auth/AOLKgBG3_FFvJpCSTqQ6bqIOkjHHIne7ks5tf67XgaJpZM4SqEXV .

--

http://www.somaiya.edu http://www.somaiya.edu/kjsmc
http://www.nareshwadi.org http://www.somaiya.edu
http://www.helpachild.in http://nareshwadi.org

yash-nisar avatar Mar 19 '18 14:03 yash-nisar

Should I focus on the tag/value builders and parsers for the upgrade to spec 2.1 and keep the rdf part under the optional section ?

I would prefer to maintain support for the RDF. I know there are several users of RDF for the Java libraries (since they are finding bugs šŸ˜‰ I assume there will also be interested users for Python as well. I’m OK with the primary focus on the tag/value since that is the larger community. I can help provide support for any RDF related questions.

Also, since our primary focus is on the tagvalue, we might face problems when while conversions in each of the formats (and also while executing validation functions in individual classes)

There are some discussions of supporting other formats like JSON, YAML, and JSON-LD. Having a design which allows for conversion between various formats would be very helpful for forward compatibility.

We have separate files for builders and parsers for the rdf and tv formats. Can I have a brief idea of how do we want the libraries to be refactored for independence from each other ?

Is there a design where we have a core model which is (semi) independent of the format and converts to build and parse different formats? I have to admit I have not dived deeply into the current Python implementation, but the basic idea of having a model independent of the serialization format seems like it should work. Where things got messy in the Java code is the validation (as you point out below) and maintaining the information on where in the input file the validation errors occur.

IIUC, this line -> if (not self.error) and (not self.document.validate(validation_messages)): in both the parsers (tv and rdf) will call the validate(...) function as defined in class Document(...). Now, since our primary focus is to take care of the tag/value files, there might be some fields in tag/value which are mandatory and we haven't included them in our rdf model. This will trigger the validate(..) function to return False when when it is called via the rdf parser. So, this is where we're facing problems since we have a common validate API tied to both the tag/value and rdf models. We could maybe pass additional fields like: validate(..., rdf=False, tv=True) and then handle things accordingly ? This would be a not so elegant solution, but will somewhat clutter our codebase with if...else statements.

I'll provide an opinion or two on this with the caveat that I have not spent a lot of time in the Python code, so the replies below may not match the current design/structure.

For the Java libraries, we tried to separate out the validation into 2 separate categories - parsing errors and specification related errors. The tag/value specific requirements could be thought of as parsing errors since they are typically related to positional requirements to allow unambiguous parsing. The specification related errors should be common for both RDF and Tag/Value. For the positional requirements in tag/value we just threw exceptions rather than handle them in the validation since the resultant model could not be reliably produced.

If we want the validation to incorporate format specific errors, could we record those errors during parsing and store them in the model rather than passing a flag to the validation method? We know when we are parsing which format is being used.

One other consideration in the validation is maintaining the location of the error in the input format. This proved to be a challenge for the Java code design. The line number and character positions need to be captured during the parsing and reported back during validation. For the tag/value we only provided this for the parser errors (not the spec related errors).

goneall avatar Mar 19 '18 18:03 goneall

Hi @yash-nisar! We are currently cleaning up this repo and I came across your issue that is still asking for support for spec version 2.1. Since this issue is already a bit older and many changes have been made in the repo: Are you still missing fields for spec 2.1?

Concerning the discussion above: Beginning with #244 we are planning to refactor the data model and the builder structure. This will enable us to tackle the refactoring and isolation of the individual formats mentioned here. In the course of the refactoring, the validation will also have to be checked. I like the approach described by @goneall for the java tools to distinguish between parser-related validation (which is probably less of a validation and more of a basic check if the content is parsable) and spec-related validation. With #212 we have an open issue concerning the validation.

In your view @yash-nisar, are there any outstanding issues from this issue or could it be closed?

meretp avatar Nov 07 '22 10:11 meretp

Closing this for now, please ping if it should be reopened

nicoweidner avatar Nov 21 '22 08:11 nicoweidner