scancode-toolkit
scancode-toolkit copied to clipboard
Install custom licenses using wheels
This will enable users to install custom licenses using wheels during license detection.
Let me try to put a use case:
- there are few special licenses that I want to detect.
- in order to help using this in CI and a few other places, I put these their own repo, which is then built as a wheel and released on PyPI
- when I need to use this, I install this with scancode with a single pip command.
A possible case could be a collection of licenses in another language, say in German, or a collection of proprietary licenses.
So what functionality does this feature actually need to provide? Should I assume that the licenses have already been installed?
- Would I need to add a new CLI option for the user to specify the installed licenses they want to include?
- After the licenses are installed, what does the code need to do? Does it need to find the locations of those installed licenses and use them in license detection under the hood?
So what functionality does this feature actually need to provide? Should I assume that the licenses have already been installed?
Would I need to add a new CLI option for the user to specify the installed licenses they want to include?
In this case, not sure. There could be two designs:
- use automatically all the licenses available even if contributed by a plugin or part of the standard index
- have each plugin expose a CLI option that would need to be provided for the contributed licenses to be used
After the licenses are installed, what does the code need to do? Does it need to find the locations of those installed licenses and use them in license detection under the hood?
That's the most likely scenario. A good example of this would be the "path providers" plugins that provide a path to a binary for instance:
- https://github.com/nexB/scancode-plugins/tree/main/builtins/typecode_libmagic-linux
- https://github.com/nexB/scancode-plugins/tree/main/builtins/typecode_libmagic-macosx
- https://github.com/nexB/scancode-plugins/tree/main/builtins/typecode_libmagic-win64
- https://github.com/nexB/scancode-plugins/tree/main/builtins/typecode_libmagic_system_provided
with an extension point defined here:
- https://github.com/nexB/plugincode/blob/main/src/plugincode/location_provider.py
and used in there:
- https://github.com/nexB/typecode/blob/3a393ad28fbb0402d96077950eb2fe313c5eb607/src/typecode/magic2.py#L167
In this case, not sure. There could be two designs:
- use automatically all the licenses available even if contributed by a plugin or part of the standard index
This seems like the simplest approach to me since then we could just add the licenses contributed by the plugins to the index once, and then after caching the index, we continue using that file until we need to add new licenses. I'll go with this unless there are any objections.
That's the most likely scenario. A good example of this would be the "path providers" plugins that provide a path to a binary for instance:
It looks like each plugin class implements its own get_locations()
method, which hard-codes the path. So it seems like the things I'll need to do are:
- implement a
get_location()
method to get the location of the installed licenses. For this, I might be able to do something like this, usingsite.getusersitepackages()
to get the package path. - implement a method similar to
load_lib()
that you linked, which allows me to load the licenses and add them to the index.
Does this sound right?
@KevinJi22 this sounds right :+1:
Using __file__
and navigating from there to a directory that contains is usually more robust than site.getusersitepackages()
or else. Fewer moving parts and simpler too.
@pombredanne I see two paths forward:
- make all the external license plugins provide a
scancode_location_provider
entry point. This way, we can reuse theget_location()
method in PluginCode since it is initialized with all plugins of thescancode_location_provider
group. However, this means the installed license plugins will be grouped with the builtin plugins (e.g. the TypeCode plugins), and we might need to figure out an additional way to filter out the external licenses. This could be done using a name constraint (e.g. all external license plugins must start with the same string) but having a consistent naming convention might be too much effort to maintain for users. - make all the external license plugins provide a different entry point, like
scancode_external_license
. This means we can't reuse theget_location()
method, so we'd have to basically duplicate that for this use case. The advantage of this approach is that we know all the plugins would contain licenses, so we could just iterate over them without doing any other checking.
Any thoughts on which is best?
@KevinJi22 Part of me is not sold on creating a different entry point for external licenses just yet. I would go with the first approach you listed and mandate that custom licenses start with a common prefix, like scancode_licenses
.