gap icon indicating copy to clipboard operation
gap copied to clipboard

RFC: Specify optional `Dependencies.NeededSystemPackages` field for `PackageInfo.g`

Open fingolfin opened this issue 2 months ago • 5 comments

Problem: several GAP packages have external dependencies that one has to install in order to use them. But packages have no systematic way of specifying that in a way that could be used by automated tools (there is the ExternalConditions but it is at best a hint for humans).

As a result, e.g. the GAP package distribution hardcodes a list of such dependencies. For the GitHub Workflow actions we also could benefit from such a list; and in our INSTALL.md we provide some some additional hard curated information about packages.

Solving this in general is a fiendishly difficult problem and I have no ambitions to do that.

My goal is rather to solve this for, let's say, the ubuntu-latest and macos-latest runners on GitHub, and maybe a bit beyond: "most" Debian/Ubuntu systems, "most" macOS systems using Homebrew, and perhaps a few bonus Linux distros thrown in (e.g. Fedora).

This can be done by simply encoding the package names that need to be installed in each case. Here is my proposal for how that could look. Specifically I am thinking of adding something like this to the Dependencies record in PackageInfo.g for various packages. This is the entry for example package:

  # for packages that need external software, besides the purely descriptive
  # `ExternalConditions` we also offer the following, which gives an
  # actionable description of needed external software at least for some
  # package distributions.
  # If present this must be a record. The keys specify the package manager.
  # For Linux distributions, it uses the Distributor IDs as provided by
  # `lsb_release -i`. For Homebrew on macOS, use "Homebrew".
  #
  # Each of these then maps to a list of package descriptors. Each package
  # descriptor is a list of strings. The first string is the name of a system
  # package to be installed. The others are reserved for future usage (e.g. to
  # specify minimal package versions)
  NeededSystemPackages := rec(
    #Homebrew := [["somepackage"]],
    #Debian :=   [["libsomepackage-dev"]],
    #Ubuntu :=   [["libsomepackage-dev"]],
    #Fedora :=   [["somepackage-devel"]],
  ),

This is how it would look for curlInterface (CC @ChrisJefferson)

  NeededSystemPackages := rec(
    Homebrew := [["curl"]],
    Debian := [["libcurl4-openssl-dev"]],
    Ubuntu :=  [["libcurl4-openssl-dev"]],
    Fedora :=  [["curl-devel"]],
  ),

Goals:

Optional bonus goals:

In addition I think these lists will also be helpful for downstream packagers -- even if you package this for, say, FreeBSD, and thus can't use the package names directly, it still gives you a fairly good idea about what is needed

Non-goals:

  • trying to solve this for "every distro"
  • defining some kind of "universal package identifier"
  • trying to have this work
  • trying to make this bullet proof -- e.g. if a distro renames a package, it's not immediately clear what to d. I suggest we worry about that when/if it happens...
  • deal with any kind of optional / conditional dependency

Assuming we agree on this or a variation, some things need to be done:

  • [ ] document this in the GAP manual
  • [ ] teach ValidatePackageInfo to validate it
  • [ ] add it to all packages that need it
  • [ ] update the package distro to use this
  • [ ] ...

Note that an alternative pre-proposal exists in https://github.com/gap-system/PackageDistro/issues/1211 but I don't see how to implement what is proposed there without a lot more work than what I suggest here; and the added benefit of that is unclear to me. As far as I can tell, under the hood the proposed tooling resorts to a hardcoded list of package names in various distributions anyway (but I might be wrong on that). That said, I'd be open to such a "better" (?) solution if someone wants to work on it; but in the meantime my proposal is meant to provide precisely what we need for the PackageDistro and also for some other tooling (e.g. gap-actions for workflows) while being fairly easy to implement.

fingolfin avatar Nov 05 '25 13:11 fingolfin

Key points from my side.

  1. Appropriate decoupling of responsibilities. We should avoid, at almost any cost, transferring package dependencies to the ⁠gap-core system dependencies.

  2. Current market solutions. The market has partially solved the problem we are addressing here. It almost always boils down to expecting a statically built binary delivered with software (plugins) from such a package. For clarity: Terraform includes providers, both the provider and Terraform are statically linked applications with an interface between ⁠core and the ⁠provider.

xi-mbp $ λ file /usr/bin/terraform
/usr/bin/terraform: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, Go BuildID=N6Xl1qpb-xIGb8Wl8Nyl/GBpEA6D-qhcsguXeh3P2/DN4_Wbvat6F2V_XJjGQ9/t4xTCpI3vo4bFNg7LzoP, stripped

xi-mbp $ file .terraform/providers/registry.terraform.io/hashicorp/azurerm/4.51.0/linux_arm64/terraform-provider-azurerm_v4.51.0_x5
.terraform/providers/registry.terraform.io/hashicorp/azurerm/4.51.0/linux_arm64/terraform-provider-azurerm_v4.51.0_x5: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, BuildID[sha1]=e4d79771481faa6be07295ab9000a7d965331171, stripped

To be fair fast every modern software moves to statically linked binaries since storage is way cheaper than it was years ago.

  1. Return of investment. I am not convinced that the type of solution I mentioned in point (2) is easily implementable in our case. It requires a lot of work, but in my opinion, developing such a system will solve many problems for us in the future and will decouple ⁠core from other components.


  2. Avoiding a closed catalog of operating systems. I also wouldn't want to resolve dependencies for individual operating systems; even the most popular ones. This will always lead us to an unfavorable situation where the distribution catalog will be closed. Software like Filebeat, Auditbeat, Kubectl provide a complete binary, and the only limitation for us is the architecture (ARM / AMD64).


  3. Difficulties for closed-catalog operating systems. Validation of such dependencies on the ⁠ValidatePackageInfo side will be very difficult, highly dependent on the operating system, and full coverage even for the systems you mentioned may be problematic. More-over, it may require various additional functionalities.

  4. Reposibility of PackageDistro I would see the responsibility of ⁠PackageDistro only as ensuring that the package provides all the appropriate binaries. (How this is achieved, for example, whether it should be separate download links in ⁠PackageInfo) is another matter.

  5. My help. I declare that I can take care of running the build on different architectures and on multiple platforms.

  6. Security considerations. Out of scope of this problem. We do not validate the source code of libraries people are delivering. But we should start thinking about it. In our current status, every library may have malicious code that can be used for Initial access. (The same attack vectors are applicable for (2).)

limakzi avatar Nov 05 '25 14:11 limakzi

@limakzi no offense, but this kind of AI generated answer is not helpful.

fingolfin avatar Nov 05 '25 15:11 fingolfin

@fingolfin I doubt GenAI parsed and generated information on azurerm provider for terraform as an example. :) I would prefer to be more respectful to others' thoughts.

  • The general idea. Its package responsibility to deliver necessary and required libraries to load package properly. Mordern software requires this library to be statically linked. Sometimes, its even an executable file, like terraform and its providers Example.
λ ./.terraform/providers/registry.terraform.io/hashicorp/azurerm/4.51.0/linux_arm64/terraform-provider-azurerm_v4.51.0_x5
This binary is a plugin. These are not meant to be executed directly.
Please execute the program that consumes these plugins, which will
load any plugins automatically

We should only and only download already built libraries from package maintainer. Every package maintainer knows best whats required to run the package. Its PackageDistribution responsibility to make cross-checks between packages.


And I really know how hard it is to maintain such simple module that's only responsibility is to make an abstraction layer to install package for various operating systems. apt has different caching mechanism than yum; yum pins packages differently; dnf uses libsolv to solve package dependencies. Iliad of problems. Sooner than later, I always moved to infrastructure as code that was specific operating system designed.

I would be afraid of extending PackageManager responsibilities to something that requires super-user privileges. And its not like there is no software that does it this way. A good example of good software is tenv. But it manages closed catalog of binaries to be statically linked and properly cached.


  • If thats the way we want to go. If we really want to keep external dependencies in PackageInfo. I would be OK with extending its responsibilities to PackageManager, but I would prefer to make another decision before. No other operating system than Ubuntu (or any other Debian / Rocky, whatever) is officially supported. Without it, we will jump into dependency hell.

An obvious counterexample for this idea are simple operating system upgrades. Lets take this part of code:

  NeededSystemPackages := rec(
    Debian := [["libcurl4-openssl-dev"]],
    Ubuntu :=  [["libcurl4-openssl-dev"]],
  ),

What is Ubuntu? Ubuntu 22.04 or Ubuntu 24.04? Or latest LTS version? This simple solution becomes:

  NeededSystemPackages := rec(
    Debian := rec( bookworm := ["libcurl4-openssl-dev"], trixie := ["libcurl4-openssl-dev"] );,
    Ubuntu :=  rec( jammy := ["libcurl4-openssl-dev"], noble := ["libcurl4-openssl-dev"] ),
),

Here comes another set of problems. Unique system version name between packages. Validation of that on PackageManager level. Argh.

Last but not least. If we decide on that, since this proposal can deliver useful information, my way to go:

  • PackageManager MUST validate NeededSystemPackages
  • PackageManager MUST NOT install these packages.
  • Field NeededSystemPackages MAY BE set.
  • Field NeededSystemPackages MUST HAVE very, very specific object structure. What I proposed above is just an RFC.

Postscriptum. If that the complete list of external dependencies, that really lovely. Very small.

limakzi avatar Nov 05 '25 17:11 limakzi

This would certainly be of great help for the GitHub actions, especially in the situation where a package depends on another packages that has external dependencies.

Some thoughts/questions:

  1. What would/should ValidatePackageInfo validate, exactly? A (hopefully simple) check whether the "structure" of NeededSystemPackages is correct? If the goal is instead to check whether all external dependencies are installed/available for installation, what should it do when called from a distro not listed? We probably don't want ValidatePackageInfo to fail? I would, for such checks, probably prefer a new function that returns true/false/fail, where fail indicates no check can be done.

  2. Essentially, the information is a matrix, where rows indicate operating systems/distros/package managers..., and columns are separate dependencies (or vice versa). Do we permit empty cells in this matrix?

  3. Since one of the non-goals is "deal with any kind of optional / conditional dependency": is this meant only for actually 'needed' external dependencies? I would expect not, so maybe the variable shouldn't start with Needed?

  4. The option to have the actual name of the dependency in there somewhere might be nice, e.g. for use in GitHubPagesForGAP?

  5. I know being bulletproof is a no-goal, but perhaps there's a simple enough way to deal with renamed / unavailable packages? E.g. as an alternative to (not instead of) ["curl"] we also allow [["curlv1","-22.04"],["curlv2","22.10-27.04"], ["curlv3","27.10"],["curlv4","28.04-"]]. So it would be a list of lists, each element consisting of a package name, and a version range (which we semver-compare with lsb_release -r)?

Just to spark some discussion, what about this alternative format:

NeededSystemPackages := [
  rec( name := "curl", Homebrew := ["curl"], Debian := ["libcurl4-openssl-dev"] ),
  [...]
],

stertooy avatar Nov 05 '25 18:11 stertooy

Problem. My two main concerns. a. What will execute package installation from this list? (Maybe nothing? We just want to validate and test if package is installed. Preferred.)

b. How will we validate package installation?


This would certainly be of great help for the GitHub actions, especially in the situation where a package depends on another packages that has external dependencies.

Yes. We could deliver something like that. If people want to test their packages against various operating systems, we could help them doing that through actions. Deliver abstraction layer for various operating systems running in containers. Great idea. It will be useful for @fingolfin too, because it could help test gap-system as whole. I think it will improve gap-tests action too.


What would/should ValidatePackageInfo validate, exactly? A (hopefully simple) check whether the "structure" of NeededSystemPackages is correct?

I would keep the responsibility of ValidatePackageInfo in validation of the structure of PackageInfo file domain. New method should be implemented, executed after PackageInfo validation passed.


Since one of the non-goals is "deal with any kind of optional / conditional dependency": is this meant only for actually 'needed' external dependencies?

This is how I understand this problem. The goal is to move external dependencies from hard-coded dependencies.py to PackageInfo. More-over, what we want to solve is only system package dependency. Correct me if I'm wrong, @fingolfin.

Example. For gapdoc to run at all it is necessary to have binaries from this list. We want to remove list of dependencies from this list and move it to PackageInfo in gapdoc.


I would expect not, so maybe the variable shouldn't start with Needed?

Maybe stronger? Required? (By requirements.txt from Python etcetera.)


I know being bulletproof is a no-goal, but perhaps there's a simple enough way to deal with renamed / unavailable packages? E.g. as an alternative to (not instead of) ["curl"] we also allow [["curlv1","-22.04"],["curlv2","22.10-27.04"], ["curlv3","27.10"],["curlv4","28.04-"]]. So it would be a list of lists, each element consisting of a package name, and a version range (which we semver-compare with lsb_release -r)?

  • Are we going to keep different list of dependencies for different versions of the same operating system?
  • Are we going to validate versions of installed package?
  • That not the goal to solve it for any, so what distros will we support?

I like the idea of adding an abstraction for dependency name. Proposal.

RequiredSystemPackages := [
  rec( name := "curl", 
       homebrew := rec( any := ["curl"] ),
       debian := rec( bookworm := ["curl"] ) 
  ),
],

limakzi avatar Nov 05 '25 21:11 limakzi