solicitor icon indicating copy to clipboard operation
solicitor copied to clipboard

Enable more fine granular definition of curations

Open ohecker opened this issue 1 year ago • 0 comments

Overview

As somebody responsible for creating curations I want to have the possibility to write curations in a way that allow the removal and addition of single licenses and copyright entries instead of having to redefine the list of licenses or copyright entries completely. While this possibility does not directly give a benefit when curating a single component it enables to easier transfer curations e.g. to other versions of the the same component.

Proposal of approach for ADD/DELETE operations

The given approach works on the Scancode input data. This introduces some coupling to the scancode data model but avoids coupling to the ComponentInfo data model. Working on the input data model gives some fine granular control and enables to write curations rules which avoid being triggered to broadly.

DELETE of Licenses

Deleting found licenses is done by defining rules which result in ignoring the license finding(s) of scancode rules in files within the scanned codebase. The following "conditions" are used for defining the rule

  • path of the file within the sources (defined as a regular expression)
  • identifier of the rule (defined as a regular expression)
  • matchedText of the finding (defined as a regular expression)

This kind of curations is independent of the ComponentInfo data model but introduces a coupling to the scancode data model / rules.

ADD of License

Adding new licenses is done by defining rules which add new license info (to the licenses found in a source file) - or "on top level".

Conditions:

  • path of the file within the sources (defined as a regular expression; if omitted the license will be applied on "top level")

Data:

  • license: the spdxid of the license to add
  • url: URL to the license text

DELETE of Copyrights

Deleting found copyrights is done by defining rules which result in ignoring the copyright finding(s) in files within the scanned codebase. The following "conditions" are used for defining the rule

  • path of the file within the sources (defined as a regular expression)
  • copyright the found copyright text to ignore (defined as a regular expression)

ADD of Copyright

Adding new copyrights is done by defining rules which add new copyright info (to the copyrights found in a source file) - or "on top level".

Conditions:

  • path of the file within the sources (defined as a regular expression; if omitted the copyright will be applied on "top level")

Data:

  • copyright: the copyright string to add

Acceptance criteria

  • Rules for deleting/adding licenses and copyrights are implemented and might be used.
  • The user guide of Solicitor is updated

ohecker avatar May 07 '24 06:05 ohecker