scancode-toolkit icon indicating copy to clipboard operation
scancode-toolkit copied to clipboard

Install custom licenses using wheels

Open KevinJi22 opened this issue 2 years ago • 7 comments

This will enable users to install custom licenses using wheels during license detection.

KevinJi22 avatar Jun 15 '22 16:06 KevinJi22

Let me try to put a use case:

  • there are few special licenses that I want to detect.
  • in order to help using this in CI and a few other places, I put these their own repo, which is then built as a wheel and released on PyPI
  • when I need to use this, I install this with scancode with a single pip command.

A possible case could be a collection of licenses in another language, say in German, or a collection of proprietary licenses.

pombredanne avatar Jun 15 '22 16:06 pombredanne

So what functionality does this feature actually need to provide? Should I assume that the licenses have already been installed?

  • Would I need to add a new CLI option for the user to specify the installed licenses they want to include?
  • After the licenses are installed, what does the code need to do? Does it need to find the locations of those installed licenses and use them in license detection under the hood?

KevinJi22 avatar Jun 15 '22 16:06 KevinJi22

So what functionality does this feature actually need to provide? Should I assume that the licenses have already been installed?

Would I need to add a new CLI option for the user to specify the installed licenses they want to include?

In this case, not sure. There could be two designs:

  • use automatically all the licenses available even if contributed by a plugin or part of the standard index
  • have each plugin expose a CLI option that would need to be provided for the contributed licenses to be used

After the licenses are installed, what does the code need to do? Does it need to find the locations of those installed licenses and use them in license detection under the hood?

That's the most likely scenario. A good example of this would be the "path providers" plugins that provide a path to a binary for instance:

  • https://github.com/nexB/scancode-plugins/tree/main/builtins/typecode_libmagic-linux
  • https://github.com/nexB/scancode-plugins/tree/main/builtins/typecode_libmagic-macosx
  • https://github.com/nexB/scancode-plugins/tree/main/builtins/typecode_libmagic-win64
  • https://github.com/nexB/scancode-plugins/tree/main/builtins/typecode_libmagic_system_provided

with an extension point defined here:

  • https://github.com/nexB/plugincode/blob/main/src/plugincode/location_provider.py

and used in there:

  • https://github.com/nexB/typecode/blob/3a393ad28fbb0402d96077950eb2fe313c5eb607/src/typecode/magic2.py#L167

pombredanne avatar Jun 15 '22 17:06 pombredanne

In this case, not sure. There could be two designs:

  • use automatically all the licenses available even if contributed by a plugin or part of the standard index

This seems like the simplest approach to me since then we could just add the licenses contributed by the plugins to the index once, and then after caching the index, we continue using that file until we need to add new licenses. I'll go with this unless there are any objections.

That's the most likely scenario. A good example of this would be the "path providers" plugins that provide a path to a binary for instance:

It looks like each plugin class implements its own get_locations() method, which hard-codes the path. So it seems like the things I'll need to do are:

  • implement a get_location() method to get the location of the installed licenses. For this, I might be able to do something like this, using site.getusersitepackages() to get the package path.
  • implement a method similar to load_lib() that you linked, which allows me to load the licenses and add them to the index.

Does this sound right?

KevinJi22 avatar Jun 16 '22 02:06 KevinJi22

@KevinJi22 this sounds right :+1: Using __file__ and navigating from there to a directory that contains is usually more robust than site.getusersitepackages() or else. Fewer moving parts and simpler too.

pombredanne avatar Jun 16 '22 16:06 pombredanne

@pombredanne I see two paths forward:

  1. make all the external license plugins provide a scancode_location_provider entry point. This way, we can reuse the get_location() method in PluginCode since it is initialized with all plugins of the scancode_location_provider group. However, this means the installed license plugins will be grouped with the builtin plugins (e.g. the TypeCode plugins), and we might need to figure out an additional way to filter out the external licenses. This could be done using a name constraint (e.g. all external license plugins must start with the same string) but having a consistent naming convention might be too much effort to maintain for users.
  2. make all the external license plugins provide a different entry point, like scancode_external_license. This means we can't reuse the get_location() method, so we'd have to basically duplicate that for this use case. The advantage of this approach is that we know all the plugins would contain licenses, so we could just iterate over them without doing any other checking.

Any thoughts on which is best?

KevinJi22 avatar Jun 20 '22 16:06 KevinJi22

@KevinJi22 Part of me is not sold on creating a different entry point for external licenses just yet. I would go with the first approach you listed and mandate that custom licenses start with a common prefix, like scancode_licenses.

JonoYang avatar Jun 24 '22 22:06 JonoYang