tools-python icon indicating copy to clipboard operation
tools-python copied to clipboard

refactor Builder

Open meretp opened this issue 1 year ago • 2 comments

Looking at the tools-python in more detail we observed some inconsistencies concerning the builder. The current implementation of the builder classes is dependent on the file format. We suggest to split the parser and builder so that the same builder can be used for different file formats. In our opinion, the builder belongs to the data model and should create a new document from the input of the parser specific to the file format. Furthermore, builders should follow standard patterns and should not need reset functionality - a new builder instance should be used instead to create a completely new object (a Package, File, or any other SPDX element). We also suggest to follow the design pattern "composition over inheritance" by using a top-level builder that has for example a package builder as a field and does not inherit from the package builder. To rebuild the builder according to these ideas we suggest the following tasklist:

  • [ ] create builder for document creation information
  • [ ] create builder for package information
  • [ ] create builder for file information
  • [ ] create builder for snippet information
  • [ ] create builder for other licensing information
  • [ ] create builder for relationships
  • [ ] create builder for annotations
  • [ ] create builder for reviews

meretp avatar Oct 19 '22 09:10 meretp

@pombredanne, @licquia What do you think about these suggested changes?

As discussed yesterday we will also reach out to the tech mailing list to get more opinions and to announce these major changes early.

meretp avatar Oct 21 '22 10:10 meretp

@meretp go for it. Do not ask for permission, only for forgiveness. The whole original RDF-centric design is a mess that needs cleaning anyway.

pombredanne avatar Oct 21 '22 10:10 pombredanne

There's also interest in this work in https://github.com/pypa/pip-audit/issues/251

kestewart avatar Oct 26 '22 15:10 kestewart

As someone who tried to use the original tooling (and when it didn't work for SPDX 2.2 tried to see if it could be 'easily' repaired - it couldn't!), I ended up writing some of my own tooling (referenced in https://github.com/pypa/pip-audit/issues/251) I agree with @nicoweidner that a data model needs to be developped (probably several to cover packages, files, documents) and then need to create separate APIs to both ingest files and build files. Good architecture says that the builders should be independent of the data. If we get it right, the data model should also be agnosttic to the type of SBOM so that builders for SPDX or CycloneDX should be capable of being produced from the same data model.

Happy to be involved in any capacity to help move this forward.

anthonyharrison avatar Oct 26 '22 15:10 anthonyharrison

I will close this.

The builder topic was discussed and it was decided that we will not add builders to the data model (at least in the beginning). Arguments against having builders were:

  • Constructors support named arguments
  • All properties are accessible anyway, so they can be set after creating an instance
  • Almost all validation concerns apart from mandatory properties will be handled by a separate validation layer

For these reasons, builders should not be necessary in most situations. One notable exception would be the tag/value parser, where properties are necessarily parsed one at a time - which would cause issues with instantiation when there are multiple required properties. But if this is a tag/value-specific concern, a suitable builder structure should be included in the tag/value parser package and not in the core data model.

nicoweidner avatar Dec 09 '22 12:12 nicoweidner