license-cop
license-cop copied to clipboard
[AG-26] Detect licenses on README, LICENSE or COPYING files on GitHub
In some cases, the GitHub API is not able to detect the license of a given package. However, it is still possible to find the license by analysing the README, COPYING or LICENSE files in the repository. Examples:
- https://github.com/jakerella/jquery-mockjax#license
- https://github.com/behnam/region-flags/blob/gh-pages/COPYING
Please be aware these files can have no extension, a markdown (.md) extension, plain text extension (.txt), reStructuredText extension (.rst) or even HTML (.html) extension.
We could search these files for known license names, such as: "MIT", "GPL", "Public Domain", "Apache License Version 2.0, etc..."
In fact, we can build a comprehensive database of known licenses from https://spdx.org/licenses/ and https://opensource.org/licenses/alphabetical and search for occurrences in those files.
Also, sometimes, the repository has a custom license. For instance: https://github.com/w3c-validators/w3c_validators/blob/master/LICENSE. Since this license does not match any license from the database, but the repository has a LICENSE file, we can include the url for this file in the "licenses" field.