ansible-builder icon indicating copy to clipboard operation
ansible-builder copied to clipboard

Design principles for expanded distro support

Open nitzmahone opened this issue 1 year ago • 0 comments

As soon as I inherited architectural responsibility for ansible-builder and started working on support for vanilla base images, I knew one of the first asks was going to be "how do I build an EE on $non_RHELish_distro base image?" :laughing:. Conceptually, I've got no problem with making builder more friendly to non-RHELish distro images, but I want to make sure we've considered the ecosystem-wide ramifications of doing so, and try to agree on a broader set of design principles for "things we'll support" and (probably more importantly) "things we won't".

This issue is just to declare the need for some pre-requisite design work. We don't need to solve everything here, but we shouldn't close this issue (or merge changes to further expand the supported base image distros) until we've got some documented consensus on a path forward.

The most immediate problems I see with expanding base container distro support are:

  1. defining efficient ways to deal with arbitrary package managers (and their configs) across multiple layers of bootstrapping
  2. dealing with "moving target" bindep profiles in collection dep metadata
  3. keeping everything working in builder without testing against a ridiculous number of base images

Item 1 seems pretty simple on the surface, but it gets hairy fast. A number of builder's current default package manager args are of unknown origin/requirement, and inconsistently applied. We just didn't have time to reverse-engineer all that for 3.0. There's also a difficult balance to strike between build performance and final image size. e.g., inline per-invocation cleanup is expensive, slow, and error-prone, but skipping it or doing it wrong results in bloated built images (unless the final image is being squashed). It also vastly complicates the multi-distro story- some distros require explicit cache management, and cleanup may require blasting 1-N arbitrary directories and/or multiple commands all under the same RUN directive. The necessary args for a given distro also change over time as new variants/configs are introduced (eg, microdnf and the upcoming dnf5, EPEL, others). I really don't want this to turn into a "builder package manager plugin" that becomes a cut down version of core's package action + modules, but short of that, we'll need to very carefully define the various places where package manager args can be injected and what kinds of things we can support.

Item 2 is probably the most difficult ecosystem-wide pain point for multi-distro support. bindep supports various (mostly undocumented?) profiles to constrain package installs to specific package managers, OSs, and OS "families", but it's probably unreasonable to expect collection maintainers to exhaustively test every combination to ensure that a given distro doesn't miss an OS package dep, or (maybe worse) include one that isn't available. Whatever we end up doing for https://github.com/ansible/ansible-builder/issues/493 should at least provide an escape hatch for this problem, but it's already incredibly difficult for collection maintainers to properly specify deps. Multi-distro makes it a lot harder, and introduces even more opportunity for carelessly-added deps to break existing setups when improperly constrained.

The difficulty of Item 3 can probably be limited somewhat if we keep the solution to item 1 scoped small. As part of the Builder 3.0 exercise, I've been trying to capture the requirements for EEs that have become distributed across numerous projects into a single place that will eventually be "the EE spec"- we already need to greatly expand builder's test matrix to ensure that it works in all the scenarios it needs to (mainly around all the different versions/configs of podman/docker/cri-o and their bugs that we have to work around to provide consistent EE behaviors under all those environments). If we're able to keep the configurations down to a handful of easy defaults for very common distros (which we'd actually do integration test builds against) and provide a tightly-scoped set of overrides for others (which we can unit-test to ensure they appear in the right places at build-time), this might not be so bad... We can't let the builder test matrix get out of hand though- if it starts looking anything like core's, we have a problem.

Things I'd suggest we explicitly avoid:

  • Attempting to auto-detect base image type, package manager, or Python (beyond existing defaults) This goes sideways fast, and is basically core's setup.py problems all over again.

  • Attempting to support "multi-distro" EE definitions I can't see how this doesn't just devolve into Yet Another Template Language/Pre-processor. :laughing: If this is desired, use an existing template language to template builder's inputs, and keep builder completely unaware of it.

  • Support for non-bash shells We're already leaning on some bash-isms in the support scripts- I don't think it's unreasonable to require bash to be present by the time prepend_base is done.

  • Expectation that any part of builder is "public API" We found out that some projects were importing introspect.py after it got moved. It's hard enough to make builder do all the things people want as a standalone tool- it becomes nearly impossible to do anything once the expectation exists that all its implementation details are stable and callable. The v3 schema exposes more basic structure about the order of the build stages to EE authors that we're kinda locked into now, and provides ways to disable some checks, but pretty much everything else is implementation detail. The existence of the target scripts, the order in which they're called, their args, etc, are all subject to change.

nitzmahone avatar Jun 05 '23 17:06 nitzmahone