PackageAnalyzer.jl icon indicating copy to clipboard operation
PackageAnalyzer.jl copied to clipboard

Feature: Add the `generate_pkgid_to_licenses()` function for retrieving the licenses for artifacts

Open DilumAluthge opened this issue 6 months ago • 2 comments

Motivation

Currently, PackageAnalyzer.jl offers the ability to process a Manifest.toml and extract the licenses for all of the Julia packages.

However, many Julia packages use binary artifacts, and the licenses for those binary artifacts may often differ from the license of the Julia wrapper.

This PR adds the ability to extract the licenses for the binary artifacts.

Example usage

import PackageAnalyzer

Pkg.activate("/path/to/my/project")
Pkg.instantiate()
Pkg.precompile()

my_manifest = "/path/to/my/project/Manifest.toml"

all_pkgs = PackageAnalyzer.find_packages_in_manifest(my_manifest)
jll_pkgs = filter(x -> endswith(x.name, "_jll"), all_pkgs)


artifact_hash_to_licenses = Dict{Base.SHA1,Vector{PackageAnalyzer.ArtifactLicenseInfo}}()

PackageAnalyzer.generate_artifact_hash_to_licenses!(
    artifact_hash_to_licenses,
    jll_pkgs;
    allow_no_artifacts=Base.PkgId[],
)

pkgid_to_licenses = PackageAnalyzer.artifact_license_map(
    jll_pkgs,
    artifact_hash_to_licenses;
    allow_no_artifacts=Base.PkgId[],
)

DilumAluthge avatar May 14 '25 12:05 DilumAluthge

@ericphanson @giordano: Would you be able to review this?

DilumAluthge avatar May 29 '25 23:05 DilumAluthge

I think it would make sense for this feature to be integrated more into the existing code. For example we can add artifact_licenses to PackageV1:

https://github.com/JuliaEcosystem/PackageAnalyzer.jl/blob/9ca90f7ea163db996f80febfa296c3f944cb4d53/src/PackageAnalyzer.jl#L128-L129

this is not breaking (see note https://juliaecosystem.github.io/PackageAnalyzer.jl/dev/#The-PackageV1-struct).

Then we can automatically populate it when analyzing that specific package.

Then we don't need to call obtain_code here or expand the API with artifact_license_map. Instead, we just update the analysis of the packagedir to also pull in artifact licenses and expose it in PackageV1. This allows the existing API (analyze_manifest, analyze with lists of packages, etc) to do a lot of the work, so we could have a smaller PR here but still get the artifact license functionality.

The other thing is I'm not sure we should use PkgId. That's a Base internal type and could expose us to breaking changes. I think it's probably enough to expose artifact licenses in PackageV1; the caller can map to PkgIds if they want.

ericphanson avatar Nov 10 '25 18:11 ericphanson