ghc-musl
ghc-musl copied to clipboard
Alpine package upgrades
During discussion in #21, I ran lsupg on utdemir/ghc-musl:v24-ghc922
and was surprised to see that many packages have updates for an image that was built so recently. I assume that ghc-musl
containers are not public-facing, but statically linking old versions of libraries could have security implications for the static executables built using ghc-musl
.
The image that I am testing was built within the past two days.
$ docker images utdemir/ghc-musl
REPOSITORY TAG IMAGE ID CREATED SIZE
utdemir/ghc-musl v24-ghc884 762d6c408038 About an hour ago 3.38GB
utdemir/ghc-musl v24-ghc922 e4b4c874e807 40 hours ago 3.71GB
utdemir/ghc-musl v24-ghc8107 bf3cac041a70 41 hours ago 3.42GB
utdemir/ghc-musl v24-ghc902 f7e76a7f3474 41 hours ago 3.32GB
There are a number of packages with updates available. In this case, ghc-musl
users may be concerned with the old crypto, TLS, and SSL packages.
$ lsupg --docker utdemir/ghc-musl:v24-ghc922
apk busybox 1.34.1-r3 1.34.1-r4
apk ca-certificates-bundle 20191127-r7 20211220-r0
apk libcrypto1.1 1.1.1l-r8 1.1.1n-r0
apk libssl1.1 1.1.1l-r8 1.1.1n-r0
apk libretls 3.3.4-r2 3.3.4-r3
apk ssl_client 1.34.1-r3 1.34.1-r4
apk bash 5.1.8-r0 5.1.16-r0
apk expat 2.4.4-r0 2.4.7-r0
apk openssl-dev 1.1.1l-r8 1.1.1n-r0
apk openssl-libs-static 1.1.1l-r8 1.1.1n-r0
apk libxml2 2.9.12-r2 2.9.13-r0
The problem is that apk update
is run to update the package index, but apk upgrade
is not run to upgrade the packages that are already installed. The parent image (alpine:3.15.0
) is over four months old, so the packages really should be upgraded, IMHO.
Using a specific version (3.15.0
) of the alpine image makes it clear exactly which image was used to build a ghc-musl
image. Adding apk upgrade
, however, means that the packages used depends on the timing of the build. Perhaps this negates the point of using a specific version? Would it be worthwhile to use alpine:latest
instead? Note that I recommend running apk upgrade
even when using the latest
image.
I am working on documentation and am adding a section on security. The way that this issue is resolved will have a big impact on that documentation.
I think that it is essential to provide a way to use the latest packages. Currently, there is no way to do so without editing the Earthfile
.
I wrote that adding apk upgrade
makes the packages used depend on the timing of the build, but I have since realized that apk update
is the actual culprit. Current images already depend on the timing; they are not reproducible since apk update
is used. I was thinking about proposing a APK_UPGRADE
argument that would allow users to disable upgrades (in the same way that the TEST_STACK
argument can be used to disable Stack tests), but I cannot think of a good reason to disable upgrades... I therefore propose that we add apk upgrade
.
Since timing is already a factor, I also propose that we make the ALPINE_VERSION
argument default to latest
. People who want to build images based on a specific version can use --build-arg
to specify the tag.
I pushed both of these changes to my upgrade
branch, if you would like to try it out.
What do you think?
It mostly looks good to me @TravisCardwell . But I think we shouldn't update ALPINE_VERSION
as it might contain backwards-incompatible changes.
I guess the term reproducibility
means a bit different than a bit-by-bit equality. I would say that in ghc-musl
an image is equivalent if it contains the same set of libraries with the same ABI (ie. same minor version).
- We should have a tag that does never change. Users who might want bit-by-bit equality would prefer that. It's similar to what we have right now.
- I think we should run
apk upgrade
periodically and upload it using a new tag, and also with a mutable tag pointing to the "latest"apk upgrade
'd version. This would probably the image I'd use. I bet Alpine publishes security fixes for some time for previous versions too, so this tag would provide the best of the both worlds. - I personally would not depend on tag where the underlying
ALPINE_VERSION
changes. As this means that it can contain new major versions of the libraries, which can break my builds.
So, I think we only disagree on the third item. I am happy to provide that tag, but given that we also provide tags that are pinned to a specific Alpine version.
A major issue with both mine and your approaches is that we do not really maintain our previous version. Someone can depend on, say v25-ghc902-latest
and expect it to get updates, however as soon as we publish v26
we'll stop updating the v25
images. So this might give a false sense of security. To avoid that, I suggest we stop versioning our library, and solely use dates on our changelog.
I am thinking like publishing all below tags:
-
ghc-musl:ghc9.0.2-alpine3.15-20220416
-
ghc-musl:ghc9.0.2-alpine3.15
-
ghc-musl:ghc9.0.2
So, every week we have a CI task rebuilding the image, and publishes the tag 1 as an immutable pointer, and updates the 2. and 3. mutable pointers.
Another improvement would be that we keep building the project on last 3 Alpine versions so they keep getting security updates. But this might be too much maintenance burden for our scale.
What do you think @TravisCardwell ?
Thank you very much, @utdemir, for the explanation!
The term reproducibility does indeed have various interpretations. The one that I had in mind when I wrote the above is the ability to produce an image with the same versions of packages given a static Earthfile
. Currently, the versions of all packages except for those already installed in the base image depend on the state of the package index, since apk update
is run. We therefore cannot provide reproducibility like this. End users can only have reproducibility of their builds by using a fixed version of the image. I did not mean that we should remove such tags, just that it would be nice to also provide general tags to ease maintenance when the risks are acceptable.
What you say about minor version upgrades is a very good point. Does Alpine make any guarantees about the change of versions of packages between point releases of Alpine? I read that Alpine stable releases are "point-in-time snapshots of the package archives" that are tested to ensure inter-package compatibility, but I have been unable to find details about the how major and minor versions are managed.
I think that it is unlikely that changes in Alpine will break lsupg
builds, so I was thinking that it would be acceptable to use general tags so that I do not need to regularly bump the versions and simply worry about issues if they arise. Perhaps this is too optimistic, though, and using Alpine release branch tags would reduce the risks without increasing the maintenance burden much, since branches are only made twice per year.
I like your suggested tagging strategy. It sounds good to me!
When building images in CI, the Alpine version will be set by passing an ALPINE_VERSION
argument via --build-arg
. End users can do the same. What should the default ALPINE_VERSION
in the Earthfile
be? I still think that latest
would be a good choice, as long as it is clearly documented in the README. Since we would always specify the version when building images for Docker Hub, we would never actually use the default. A significant benefit is that it would never need to be updated. Do you agree?
Providing updated images for the last three Alpine branches sounds good to me as well. I am still curious about how frequently relevant packages have upgrades available. Perhaps we can try it out and see if it is suitable or not.
I just pushed a commit to my upgrade
branch that implements the tag syntax that you suggested above.
For example, the following command builds image utdemir/ghc-musl:ghc9.2.2-alpine3.15-20220416
.
$ earthly --allow-privileged --build-arg ALPINE_VERSION=3.15 +ghc9.2.2
The image is saved with the most specific tag (1). Do you know of a way to also tag this image with the mutable pointers (2 and 3) from within the Earthfile
, or will this need to be done externally?
The date is formatted using the UTC time zone.
Since I named the ghc
targets to match the tags, I renamed the targets to match the new tags. For example, ghc9.2.2
is now used instead of ghc922
.
When using ALPINE_VERSION=latest
, the tag ends up looking like utdemir/ghc-musl:ghc9.2.2-alpinelatest-20220416
. This does not bother me. How about you?
The name:tag arguments to update-readme.sh
are passed without change in this branch, but I would like to update how this is done if we decide to go ahead with the new tag format.
I am thinking about how to add support for building images for three Alpine versions as well as conditional building of images (only if there are upgrades available). I am pretty confident that I can do this within the Earthfile
without much trouble, though it requires some refactoring. I will likely give it a try if there is a way to add tags to images within the Earthfile
. If tagging must be done externally, however, it may be preferable to do this externally as well and keep the Earthfile
simple...
By defining the DATE
argument at the top level, the base-system
layer is forced to update whenever that value changes. The cache is therefore only used for a maximum of one day. This is nice because we want to be sure to check for available upgrades. :smile:
By the way, I am fine with setting the default ALPINE_VERSION
to 3.15
if you prefer it.
Quick update:
I'm going on a holiday for the next week, so I won't be able to look at this project (probably the next week too as I'll be busy catching up with other stuff).
But I am happy with your suggestions above,
Do you know of a way to also tag this image with the mutable pointers
I'm unsure :(. Multiple SAVE IMAGE
instructions might work, but it'd be wasteful if it does a bunch of works other than tagging.
When using ALPINE_VERSION=latest, the tag ends up looking like utdemir/ghc-musl:ghc9.2.2-alpinelatest-20220416. This does not bother me. How about you?
Doesn't look super pretty, but I don't have any better alternative. Happy to leave it to :).
By the way, I am fine with setting the default ALPINE_VERSION to 3.15 if you prefer it.
As long as we also provide pushed images that are tied to a specific Alpine version, I'm happy with any default.
Regarding other Earthfile refactors or introducing another shellscript, I trust your judgement :). Earthly was mostly an experiment for me, and it would be okay for me even if we replace it with Dockerfile's and shell scripts.
When I'm away, please feel free to do any improvements to the codebase, and feel free to merge them to (but of course I'd be happy to review them). As I said, I'll be away next week, but after that we can cut a new release with the changes if we have them merged by then.
No problem, mate! Have a great holiday!
I started logging available upgrades to packages in our Alpine 3.15 images from the 18th. There are none so far. I will continue to log this, and we can use the results to decide how to do the CI.
Multiple
SAVE IMAGE
instructions might work, but it'd be wasteful if it does a bunch of works other than tagging.
Good idea! I will test that.
If I have the time, I will see if I can implement everything within the Earthfile
. I can write scripts if it does not work well. I will likely not merge such changes until you can take a look at it. I am in no rush, so please do not worry about it while you are on your holiday and catching up after your return.
I logged available upgrades from April 16 until July 31, using Alpine 3.15 and GHC 9.2.2 throughout. Of the 107 days logged, 14 of them had upgrades. Here is a quick visualization:
Multiple
SAVE IMAGE
instructions might work, but it'd be wasteful if it does a bunch of works other than tagging.
This works, but there is one strange artifact.
Here are the Earthfile
commands:
SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}-alpine${ALPINE_VERSION}-${DATE}"
SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}-alpine${ALPINE_VERSION}"
SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}"
For some reason the Docker image IDs are not all the same:
REPOSITORY TAG IMAGE ID CREATED SIZE
utdemir/ghc-musl ghc9.2.4 a1d488614526 41 seconds ago 3.68GB
utdemir/ghc-musl ghc9.2.4-alpine3.16.1 a1d488614526 41 seconds ago 3.68GB
utdemir/ghc-musl ghc9.2.4-alpine3.16.1-20220730 f9b223601ff2 2 minutes ago 3.68GB
Inspecting, I see that some empty layers are added:
$ docker history utdemir/ghc-musl:ghc9.2.4-alpine3.16.1-20220730
IMAGE CREATED CREATED BY SIZE COMMENT
f9b223601ff2 3 minutes ago mount / from exec /bin/sh -c ALPINE_VERSION=… 2.22GB buildkit.exporter.image.v0
<missing> 4 minutes ago mount / from exec /bin/sh -c ALPINE_VERSION=… 975MB buildkit.exporter.image.v0
<missing> 4 minutes ago mount / from exec /bin/sh -c ALPINE_VERSION=… 478MB buildkit.exporter.image.v0
<missing> 2 hours ago pulled from docker.io/library/alpine:3.16.1@… 5.52MB buildkit.exporter.image.v0
$ docker history utdemir/ghc-musl:ghc9.2.4-alpine3.16.1
IMAGE CREATED CREATED BY SIZE COMMENT
a1d488614526 2 minutes ago fileop target 0B buildkit.exporter.image.v0
<missing> 2 minutes ago fileop target 0B buildkit.exporter.image.v0
<missing> 3 minutes ago mount / from exec /bin/sh -c ALPINE_VERSION=… 2.22GB buildkit.exporter.image.v0
<missing> 4 minutes ago mount / from exec /bin/sh -c ALPINE_VERSION=… 975MB buildkit.exporter.image.v0
<missing> 4 minutes ago mount / from exec /bin/sh -c ALPINE_VERSION=… 478MB buildkit.exporter.image.v0
<missing> 2 hours ago pulled from docker.io/library/alpine:3.16.1@… 5.52MB buildkit.exporter.image.v0
This is unfortunate, but I do not think it causes any problems.
I have only been able to test saving to multiple tags locally, but I doubt that there will be any issues when also pushing to Docker Hub.
I realized that there is another issue with this multi-tagging. If we want to provide images for a given version for GHC using multiple versions of Alpine, then building an image for an older version of Alpine would push the image with the GHC-only tag. This is not desired, as that tag should point to the image using the latest (supported) version of Alpine.
I attempted to fix this by adding new flag arguments to the image
target:
ARG TAG_DATE=1
ARG TAG_ALPINE=0
ARG TAG_GHC=0
The idea was to allow use of command-line arguments to specify which tags are created/updated, as follows:
IF [ "$TAG_DATE" = "1" ]
SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}-alpine${ALPINE_VERSION}-${DATE}"
END
IF [ "$TAG_ALPINE" = "1" ]
SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}-alpine${ALPINE_VERSION}"
END
IF [ "$TAG_GHC" = "1" ]
SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}"
END
Unfortunately, this does not work! Earthly gives the following error message:
Error: build target: build main: failed to solve:
Earthfile line 103:2 apply BUILD +image: earthfile2llb for +image:
Earthfile line 95:2 no non-push commands allowed after a --push
in +image --GHC=9.2.4
in +ghc9.2.4
Conditional tagging of multiple images is not supported. If you have any ideas about how this issue could be resolved, please let me know. I will remove the GHC-only tag for now, until we can figure out a solution.
Conditional tagging of multiple images is not supported. If you have any ideas about how this issue could be resolved, please let me know. I will remove the GHC-only tag for now, until we can figure out a solution.
I thought of a solution while documenting the issue in a comment. Multiple conditional tags are not supported due to the above limitation, but we can have a single conditional tag as long as it comes first!
I updated the image
target to have a single new argument flag:
ARG TAG_GHC=0
The images are now saved as follows:
IF [ "$TAG_GHC" = "1" ]
SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}"
END
SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}-alpine${ALPINE_VERSION}"
SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}-alpine${ALPINE_VERSION}-${DATE}"
This works!