pipgrip icon indicating copy to clipboard operation
pipgrip copied to clipboard

:sparkles: Add --detect-licenses flag

Open ddelange opened this issue 3 years ago • 3 comments

This draft will most likely not be finished, as the licencing topic is a rabbit hole which is practically impossible to do right due to lack of strictness in the ecosystem.

As pipgrip exclusively has access to wheels, many licenses will not be present (see code comments for examples) and would call for a source distribution fallback (not trivial).

Assuming authors distribute their packages correctly, legal files should be present in wheels (ref https://wheel.readthedocs.io/en/stable/user_guide.html#including-license-files-in-the-generated-wheel-file found at https://jwodder.github.io/kbits/posts/pypkg-mistakes/#top-level-readme-or-license-file-in-wheel), but sadly this is not the case (even pip's vendored licenses aren't reproduced in the pip wheel).

ddelange avatar Feb 08 '21 15:02 ddelange

@ddelange I suppose the licence information would be available somewhere right? I mean on the repo level? If there is repo information available for the repo in setup.py we can use api's to query this information, just saying.

jdvala avatar Jun 19 '21 06:06 jdvala

Hi @jdvala :boom:

Indeed the licence can often be found in the (metadata of the) repo. The repo (hosted source) can be any VSC type hosting or even a plain HTML sitemap or so.

Elaborating on the inline comment, technically speaking, if the licence is missing in the wheel (the distribution which pipgrip installs and is executed by the user), for most licenses that counts as a failure to reproduce the license. This violation aside, there is at this point technically no guarantee that the license you pick up from another distribution (e.g. the hosted source, or an sdist downloaded from pypi) will correspond to the distribution on your system. Usually, a licence is valid only in fulltext, delivered alongside the actual distribution or embedded in each file or so. There are also other legal files like AUTHORS, which might also be required information to build a complete/valid 'licence info package' (so more than just 'pipgrip': 'BSD-3') for a distribution you want to run.

Some existing tools I've seen provide some 'confidence level' for their licence labels, and mostly won't be able to back that up with the licence fulltext for that specific version. I guess under the 'something is better than nothing' philosophy, and the lack of licensing standardisation in the Python ecosystem, this technique of looking at e.g. hosted source, pypi warehouse metadata, source distributions (sdist) etc. as fallback is the best alternative currently.

ddelange avatar Jun 24 '21 01:06 ddelange