`build_home_license()` interaction with `usethis::use_*_license(include_future = TRUE)`
If a call to usethis::use_*_license() is made with the include_future = TRUE argument to apply a GPL, AGPL, LGPL or Apache licence to a package, this will set the Licence field in the description file to be something along the lines of GPL (>=3).
Whilst this versioning is picked up by pkgdown, where it isn't doesn't exactly match any version in license.db it is just displayed as a text string. Would it be better for build_home_license() to identify where a future provision is included in a package license, then proceed as if the user had used a strict version of the license, before appending the term "or later"?
I think this is a good idea in principle, but if we start down this path, I think we'd need to support the full R license specification, and that's going to be complicated. The code that base R uses is at https://github.com/wch/r-source/blob/d35cfa99425da7f9d57928efa1961dc5f9caaef2/src/library/tools/R/license.R#L83. We could maybe do something like tools:::expand_license_spec_component_from_db("GPL (>=3)"), but the output from that is not super easy to work with.
...
Hmmmm, actually, maybe we could just special case the most common uses of >=:
library(dplyr, warn.conflicts = FALSE)
library(stringr)
packages <- as_tibble(available.packages())
parsed <- packages |>
select(package = Package, license = License) |>
mutate(
or_file = str_detect(license, fixed("| file LICEN[CS]E")),
plus_file = str_detect(license, fixed("+ file LICEN[CS]E")),
license = str_remove(license, " [+|] file LICEN[CS]E")
)
parsed |> count(license, sort = TRUE)
#> # A tibble: 121 × 2
#> license n
#> <chr> <int>
#> 1 GPL-3 4804
#> 2 MIT 4369
#> 3 GPL (>= 2) 4270
#> 4 GPL-2 2454
#> 5 GPL (>= 3) 1816
#> 6 GPL 446
#> 7 GPL-2 | GPL-3 343
#> 8 CC0 232
#> 9 LGPL-3 170
#> 10 Apache License 2.0 160
#> # ℹ 111 more rows
Created on 2024-04-17 with reprex v2.1.0
Looks like if we just handled GPL (>= 2) and GPL (>= 3) that'd get us support for an extra ~6000 packages, which is ~30% of CRAN.
Given this scope, I think this would be a good issue for TDD — it's just a matter of manually writing the HTML for these three cases and then making a named character vector which is used in autolink_license().