license-checker-rseidelsohn icon indicating copy to clipboard operation
license-checker-rseidelsohn copied to clipboard

Licensed under "Custom: https://github.com..."

Open fredrikaverpil opened this issue 1 year ago • 14 comments

Hi,

Have you ever seen something like this after running the checker in GitHub Actions?

Package "[email protected]" is licensed under "Custom: https://github.com/MY-ORG/MY-REPO/actions/workflows/test.yml/badge.svg" which is not permitted by the --onlyAllow flag. Exiting.
Error: Process completed with exit code 1.

I can provide more info but figured I'd start like this. My project is using npm workspaces which might be related to this...

fredrikaverpil avatar Jan 31 '23 15:01 fredrikaverpil

Okay, so I managed to solve it. My root package.json file looked like this:

{
  "name": "mypkg",
  "version": "1.0.0",
  "description": "Root npm project, only meant to control workspaces.",
  "workspaces": [
    "packages/mypkg"
  ],
  "scripts": {
    "blah blah": "random command here"
  },
  "devDependencies": {
    "openapi-typescript-codegen": "^0.23.0"
  }
}

And all I had to do was insert the license in the json to get rid of that weird error:

{
  "name": "mypkg",
  "version": "1.0.0",
  "description": "Root npm project, only meant to control workspaces.",
+  "license": "MYLICENSE-HERE",
  "workspaces": [
    "packages/mypkg"
  ],
  "scripts": {
    "blah blah": "random command here"
  },
  "devDependencies": {
    "openapi-typescript-codegen": "^0.23.0"
  }
}

fredrikaverpil avatar Jan 31 '23 15:01 fredrikaverpil

However, I'm now noticing that the checker is unable to check for dependencies underneath this npm workspace.

fredrikaverpil avatar Jan 31 '23 15:01 fredrikaverpil

Hi @fredrikaverpil , thanks for informing me about this. I've never heard of npm workspaces before and I will have a more detailed look into the documentation - thank you for linking it in your comment.

Indeed, the license-checker-rseidelsohn (oh my, I really dislike this clumsy name, but I guess it's best to not change it again. Some of my colleagues suggested using a name space: rseidelsohn@license-checker, but this wouldn't make it more handy, I think) can not handle workspaces - yet it looks like a nice and useful feature to add.

I do not promise anything here, but I plan on dealing with it the next days. Also, of course, feel free to create a pull request on your own if you feel like it. I always appreciate any kind of input - be it a pull request or just a question or note.

Cheers, Roman.

RSeidelsohn avatar Jan 31 '23 15:01 RSeidelsohn

@RSeidelsohn there's an open issue about this here: https://github.com/RSeidelsohn/license-checker-rseidelsohn/issues/36

fredrikaverpil avatar Jan 31 '23 15:01 fredrikaverpil

Hi,

We are also seeing this occur an increasing amount of times in various of our repos as we've rolled out license checking in more of our JS repos.

Another example

Reproduction:

  1. Create a directory somewhere
  2. Add a package.json package.json
{
  "dependencies": {
    "cycle": "^1.0.3"
  }
}
  1. npm install
  2. npx license-checker-rseidelsohn

Output

└─ cycle@@1.0.3
   ├─ licenses: Custom: https://github.com/douglascrockford/JSON-js
   ├─ repository: https://github.com/dscape/cycle
   ├─ path: /Code/tmp/license/node_modules/cycle
   └─ licenseFile: /Code/tmp/license/node_modules/cycle/README.md

In the published version of cycle there's actually no license specified in package.json or README.md. Its a bit unclear how it finds the first random URL to match instead of saying its unknown or something.

We've seen similar errors where some other package which has a proprietary license suddenly get parsed as having a https://badge.fury.io/js/ because the first URL like thing in the readme is markdown for a badge icon. In that packages case there is actually a license field set in the package.json, to an URL.

Flydiverny avatar Mar 21 '23 09:03 Flydiverny

@RSeidelsohn After some debugging and digging in the code when there is a custom license specified, for example a URL in package.json It will later fall into this code block: https://github.com/RSeidelsohn/license-checker-rseidelsohn/blob/d95e43e33299b75d5d7e3da6b6529aeed237f697/lib/index.js#L195-L204

Which will then override the already correctly detected license from the license field.

Flydiverny avatar Mar 21 '23 10:03 Flydiverny

#71 shows a failing test for this

Flydiverny avatar Mar 21 '23 10:03 Flydiverny

I think there are two parts to this. Not extracting URLs from the license files and README.md which is checked in different places Like this code block and the previously linked one in comment above https://github.com/RSeidelsohn/license-checker-rseidelsohn/blob/d95e43e33299b75d5d7e3da6b6529aeed237f697/lib/index.js#L162-L164

Then somehow deciding when to check these additional files or not if there already is a custom URL detected? Is there any case where the custom license actually does need to be overriden? Like currently Custom: URL and Custom: FILE_REF are the same output. Maybe custom URL shouldn't look in additional files but FILE_REF should?

Flydiverny avatar Mar 21 '23 10:03 Flydiverny

Sorry guys, this is a flaw I have to fix. I'll dig into this code next - unfortunately, this code was neither created by me, nor did I get a hand-over. So I have to try to understand what's going on here and why so and only then I'll be able to fix it. Should anyone of you or someone else feel the urge to provide a fix: PR's are much appreciated!

RSeidelsohn avatar Apr 01 '23 15:04 RSeidelsohn

Hey @Flydiverny and @fredrikaverpil ,

I am finally on it! I had to refactor quite some of the still old logic, which helped me understand more and even find and fix another bug. I'm now quite near, but still, please bear with me. Holidays are approaching and I guess I won't do much during that time. Still, I've already fixed the failing test you, @Flydiverny, thankfully brought in and now I have to fix one old test that is now failing. But enough for today. Just wanted to give you an update. Your code is merged, but no new release available yet.

Cheers, Roman.

RSeidelsohn avatar Apr 02 '23 17:04 RSeidelsohn

Thanks a lot for having a look at it @RSeidelsohn! Enjoy the holidays :) 🐣

Flydiverny avatar Apr 02 '23 19:04 Flydiverny

Hi @Flydiverny and @fredrikaverpil ,

a short update: I have worked a lot on the code during my vacation and just released a bugfix version 4.2.1, which at least pleases the new test you added, @Flydiverny , but still does not respect npm workspaces (I guess - I did not test this yet). I refactored a huge part of the code which is still quite monolithic in terms of way too big functions. This is a heritage from the original license-checker, which I once took over. Refactoring this code by introducing speaking and descriptive function and variable names and splitting up the way too large functions into smaller chunks will still need a lot of time, but now it's again way better than before and hopefully enables me to do bug fixes and create new functionality faster and easier in the future.

RSeidelsohn avatar Apr 15 '23 10:04 RSeidelsohn

Hi,

We are also seeing this occur an increasing amount of times in various of our repos as we've rolled out license checking in more of our JS repos.

Another example

Reproduction:

  1. Create a directory somewhere
  2. Add a package.json package.json
{
  "dependencies": {
    "cycle": "^1.0.3"
  }
}
  1. npm install
  2. npx license-checker-rseidelsohn

Output

└─ cycle@@1.0.3
   ├─ licenses: Custom: https://github.com/douglascrockford/JSON-js
   ├─ repository: https://github.com/dscape/cycle
   ├─ path: /Code/tmp/license/node_modules/cycle
   └─ licenseFile: /Code/tmp/license/node_modules/cycle/README.md

In the published version of cycle there's actually no license specified in package.json or README.md. Its a bit unclear how it finds the first random URL to match instead of saying its unknown or something.

We've seen similar errors where some other package which has a proprietary license suddenly get parsed as having a https://badge.fury.io/js/ because the first URL like thing in the readme is markdown for a badge icon. In that packages case there is actually a license field set in the package.json, to an URL.

Ok, the whole functionality for automatically finding a license in a README file is pretty doomed. Currently, the license-checker just looks for the first URL in the README, which leads to the situation you mention. This doesn't make any sense and I have to think of a way to improve on this.

Update

I released a kind of a fix for it in version 4.2.4 - URLs in READMEs will now only be taken as a last resort if the READMEs at least include the term "license". This fixes already a lot of false hits.

RSeidelsohn avatar Apr 16 '23 15:04 RSeidelsohn

I don't know if this helps in the long run, but I didn't like how the --files output [email protected] documents were sometimes including the full README markdown from dependency package that just happened to include a license header. I was also unable to get any urls included in my output.json document, even though the property was specified in my customFormatExample.json, but I may have been doing something wrong there.

So I wrote two regexes to extract everything under a license markdown header - one for

  1. ## header syntax
  2. ---- or ==== underline syntax

e.g.

## license MIT / whatever

or

license ------- etc

and

license ====== etc

They only extract the text until the next header in the README, or the end of file.

They are in .NET flavour but probably work for other regex flavours, just strip out the ?<license> and ?<header> tags, which is .NET specific syntax.

  1. (?smi)(?<header>^#+\s+licen[sc]es?\s*)(?<license>.+?(?=#+|\z))
  2. (?smi)(?<header>^\slicen[sc]es?.\n[=-]+\n)(?<license>.+?(?==+|-{2,}|\z))

An example of 1 at regex101 and 2 at regex101

  • In my project, the these [email protected] license files only had 'MIT' underneath the license header anyway, so not much was gained. And there were only a few packages that were outputting the full README markdown. But they didn't match against any 'proper' license texts, which was important. But they have not been tested against enough examples to be fully trusted, they just worked for my project with 100+ dependencies of varying kinds of license output.

  • I don't know much about the license resolution implementation in license-checker-rseidelsohn, but I don't think examining README files should be a fallback option. If the license type is known but the license text is not, I think it would be better to just grab the raw examples from the spdx repository and output those as the license text instead.

  • Disclaimer, this open-source licensing stuff is all very new to me, so please excuse any newbie knowledge I haven't understood.

zola-25 avatar Aug 13 '23 19:08 zola-25