scancode-toolkit icon indicating copy to clipboard operation
scancode-toolkit copied to clipboard

Update rules with required phrases automatically

Open AyanSinhaMahapatra opened this issue 1 year ago • 1 comments

This is a continuation of https://github.com/aboutcode-org/scancode-toolkit/pull/3254 with added required phrases in license rules after review and further manual curations. Also contains improvements in required phrase collection and marking.

Reference: #2637 #2878

Tasks

  • [x] Reviewed contribution guidelines
  • [x] PR is descriptively titled 📑 and links the original issue above 🔗
  • [ ] Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR Run tests locally to check for errors.
  • [x] Commits are in uniquely-named feature branch and has no merge conflicts 📁

AyanSinhaMahapatra avatar Sep 17 '24 08:09 AyanSinhaMahapatra

I am pushing shortly a few updates:

  • decouple the creation of new rules from updating existing rules in a separate CLI
  • ensure we skip more rules in the whole process: any rule that cannot be matched approximately and not only tiny rules, and also false positives
  • ensure that no rule get a required phrase addition that would break in the middle of a URL, email, or copyright. This will be done to check that no required phrase injection changes the set of ignorables of a rule and makes the URL not longer a proper URL for instance.
  • extend "skipping" the collection of required phrases flag to skip a rule from both required phrases collection AND injection. This allow to handle exceptions more easily.

pombredanne avatar Oct 04 '24 12:10 pombredanne

Hey! This PR looks promising as I think it has the potential to greatly reduce false positives. Just to drop an example, having required phrases throughout the rule set would avoid "silly" mistakes like this:

"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/lgpl-2.1_250.RULE",
"matched_text": "// Licensed under the MIT license. \nnamespace Microsoft.OpenApi.Readers.V2"

And I do see quite a lot of these "ghost" detections.

I was just checking in to ask if there is any hope of seeing this PR merged. It has been open for some time.

petergardfjall avatar Feb 19 '25 14:02 petergardfjall

@petergardfjall re:

I was j ust checking in to ask if there is any hope of seeing this PR merged. It has been open for some time.

Yes! I am about to have a chat with @AyanSinhaMahapatra on just that.

pombredanne avatar Feb 19 '25 14:02 pombredanne

The longer story is that there are reconciliations to do wrt. the changes I made and changes that @AyanSinhaMahapatra has not yet pushed.

pombredanne avatar Feb 19 '25 14:02 pombredanne

@AyanSinhaMahapatra all green... I am merging now :bow:

pombredanne avatar Apr 10 '25 07:04 pombredanne