scancode-toolkit icon indicating copy to clipboard operation
scancode-toolkit copied to clipboard

Invalid copyright not detected

Open pombredanne opened this issue 1 year ago • 11 comments

[C] The Regents of the University of Michigan and Merit Network, Inc. 1992, 1993, 1994, 1995 All Rights Reserved is rare and not detected because [C] is not a valid copyright "sign"

We have a few other cases in https://github.com/search?q="Copyright+[C]"&type=code

The only sane resolution I can think of is to normalize these warts in text preparation:

  • replace [C] The Regents of the University by (C) The Regents of the University
  • replace Copyright [c] by Copyright (c) in all character cases.

[C] cannot be/is not a valid sign and this would otherwise trigger a badzillion of false positives as seen in https://github.com/search?q="[C]"&type=code (actually only millions, not badzillions)

pombredanne avatar Feb 19 '24 11:02 pombredanne

I'm taking this one @pombredanne

vaibhavyadav-dev avatar Feb 19 '24 15:02 vaibhavyadav-dev

@pombredanne If I'm right, I've just to make changes as you suggest or this require anything else, if anything else required you can tell me I'm beginner and want to make good contributions.

vaibhavyadav-dev avatar Feb 19 '24 16:02 vaibhavyadav-dev

@CaptainTron You could start by crafting the unit tests that fail for now Then check this https://github.com/nexB/scancode-toolkit/blob/79aae3481833de80913383b2aa21fc8cdfb9813a/src/cluecode/copyrights.py#L3987

pombredanne avatar Feb 19 '24 17:02 pombredanne

@CaptainTron You could start by crafting the unit tests that fail for now Then check this

https://github.com/nexB/scancode-toolkit/blob/79aae3481833de80913383b2aa21fc8cdfb9813a/src/cluecode/copyrights.py#L3987

@pombredanne can you elaborate a bit more, I'm not getting as of now, as what unit test to look for, I've gone through that line and doc, still I'm confused!.

vaibhavyadav-dev avatar Feb 20 '24 14:02 vaibhavyadav-dev