zero-to-jupyterhub-k8s icon indicating copy to clipboard operation
zero-to-jupyterhub-k8s copied to clipboard

Automatic patches of known vulnerabilities in images for the the latest stable chart release

Open consideRatio opened this issue 2 years ago • 4 comments

We have an automated system that rebuilds our Docker images whenever that results in a known vulnerability having get patched. This allows us to publish new development release of the Helm chart that references the new images with patched known vulnerabilities.

The idea when I implemented this, was to help a user of the Helm chart get patched known vulnerabilities for images. But, the downside is that for the Helm chart user to get the patched docker images, they are required to use the absolute latest version of the Helm chart - which is still unreleased and without changelog documentation etc.

Feature idea

I consider the following idea to help users patch known vulnerabilities.

1. rebuild images for latest chart release

We rebuild and re-publish images associated with the latest version of the Helm chart - according to the state of the git repo as it was. This will make our Dockerfile be able to install more recent versions of apt packages with patched known vulnerabilities.

2. Add pip-audit --fix for our Dockerfiles with Python code

To also patch known Python vulnerabilities, we make use of a tool like pip-audit. I think that can bump specific Python dependencies to patch patchable known vulnerabilities without making other significant changes.

This would imply that the version in requirements.txt we have frozen in the git history could fail to reflect the image over time - but only because pip-audit --fix made a change.

3. Add -original suffix or similar to images

If we do this, we could allow users to have their images updated over time by providing a frozen image that won't be updated - for example with an appended -original suffix to the version. So for example jupyterhub/k8s-hub:1.2.0-original would be a frozen version that wasn't getting patched, and jupyterhub/k8s-hub:1.2.0 would get patched automatically every day.

4. Document the importance of imagePullPolicy

If dynamic versions of the images are used, having a Pod's container have a imagePullPolicy of "Always" would ensure that a user would always get the latest version of an image when a Pod starts - even if a k8s Node where the Pod scheduled had an outdated version of the image already. But, of course, only when a Pod starts up - which means that if there is a new version of the image, they would need to manually restart the pod if they wanted to get it as well.

5. Consider the default value of imagePullPolicy

We currently don't have imagePullPolicy set to anything by default for our container's Pods - which means it will be acting as "IfNotPresent" I think. For the patches to be reliably applied, users would need to have "Always" specified.

A downside of changing this, is that the container registry will be more burdened. I think though, that unless the image checksum has changed, the image won't be re-downloaded again - which probably is the most expensive work provided by the container registry.

consideRatio avatar Feb 19 '22 09:02 consideRatio

Do you think it would be simpler or more complex to make actual patch releases and do the same workflow we currently have on a 1.2.x-type branch created on release? That seems easier to reason about, as it would be clear which images have been patched or not.

minrk avatar Feb 21 '22 08:02 minrk

@minrk hmmm yeah, I think it would be easier from a UX and documentation writing perspective, but harder from a tech perspective - I did some mental exploration and couldn't see a clear path to a sustainable implementation, but I don't rule out there is one.

Challenges considered

  • [ ] Chartpress fix Currently, I think chartpress requires tags to be on the current branch for its versioning. If we push tags of commits to other branches, we need to ensure to manage that first. See https://github.com/jupyterhub/chartpress/issues/143.
  • [ ] Updating changelog in main branch How do we update the changelog in the main branch after pushing a tag?
  • [ ] Managing version branches/tags both manually and automatically How do we manage version branches / tags with automation that won't conflict with manual work?
  • [ ] How to interact with dependabot? Dependabot would patch our main branch's images/hub/requirements.txt file, but not another branch etc - not by patching the latest release commit etc to a dedicated version branch. Can we retain use of dependabot with this system? Can we avoid having dependabot show alerts to patch things that have already been patched etc?

consideRatio avatar Feb 21 '22 10:02 consideRatio

Currently, I think chartpress requires tags to be on the current branch for its versioning. If we push tags of commits to other branches, we need to ensure to manage that first.

Answered over there, but I don't think chartpress should be responsible for this.

How do we update the changelog in the main branch after pushing a tag?

It's a little tedious, but it's the same issue any time we have a backport branch to deal with. Usually that means making the changelog changes by hand in both branches. Whether that's forward-porting from the 1.x branch to main after release, or backporting to 1.x after merging changes in main doesn't matter too much. I'm not sure there's another solution.

But we need to communicate these changes anyway, right? If we don't have any version numbers associated, how to we tell folks things changed?

How do we manage version branches / tags with automation that won't conflict with manual work?

We'll need to make sure that the things we currently run only on main (e.g. CI branch filters) also run on backport branches (e.g. regex that matches (\d+\.)+\.x or backport/.* or whatever backport pattern we choose. I'm not sure there's anything else to be done.

How to interact with dependabot?

Dependabot can be used with multiple branches, though config is a bit tedious.

I don't think that is relevant, because any 1.2.x branch would be forked after creating the 1.2.0 release. So the tags should be right for a backport branch.

All of these issues apply if we want to do any backporting, whether it's related to this security-patches workflow or not, I think. We would need more special handling if we want to update chart/docker tags so they don't match git.

minrk avatar Feb 21 '22 12:02 minrk

I also think we should aim for smaller more frequent releases rather than doing anything fancy with rebuilding images:

  • One key idea behind K8s is immutable infrastructure, changing the contents of tags means two deployments can end up with different systems which is unexpected
  • We need a way to notify people about rebuilt images, which means posting an announcement somewhere and saying what's changed- so that's basically a release anyway. If we don't announce it then existing admins won't know they need to restart/update their deployments
  • You end up having to decide which releases to maintain and to communicate this. For example, would we maintain 1.1.x, 1.2.x, etc?
  • Most vulnerabilities are irrelevant! A package is marked vulnerable even if the vulnerable function is never used in JupyterHub. If we update too often people will start ignoring updates, partly due to fatigue, and partly because every update has the potential to disrupt someone deployment.
  • If an organisation is really concerned about lack of updates they'll have their own workflow in place regardless of whatever we do.

I think releasing every month (or every two months) is enough unless there's a known vulnerability that affects Z2JH.

manics avatar Feb 21 '22 13:02 manics

We've been using JupyterHub for four years or so now but have only recently moved from docker swarm to k8s using the ZeroToJupyterHub project. We were surprised to see when we did so that the good work reducing vulnerabilities in the main dockerhub images has not carried over to those in this project.

I don't recall which of grype or trivy we're currently using for scanning but the main jupyterhub/jupyterhub image only has a handful of low/mediums. By contrast the chart 2.0.0 hub image has 9 criticals, and 100s of highs. Many of these can be addressed by removing vim and sub-components.

It would definitely seem that an improvement in handling vulnerabilities is required whether it be manual release checklists based or a more automated solution.

fifofonix avatar Oct 25 '22 11:10 fifofonix

@fifofonix we have automation rebuilding the images we provide whenever they successfully patch a known vulnerability. See https://github.com/jupyterhub/zero-to-jupyterhub-k8s/actions/workflows/vuln-scan.yaml.

So if rebuilding our images results in less vulnerabilities, we do that semi automatically (new PRs opened automatically, so just need to press merge), and a new development release of the chart is made.

consideRatio avatar Oct 25 '22 12:10 consideRatio

I think releasing every month (or every two months) is enough unless there's a known vulnerability that affects Z2JH.

I think that, combined with providing a -slim version of the hub image may have been a good balance of effort to reduce known vulnerabilities.

I'll close this as I extract no clear action points from it atm.

consideRatio avatar Oct 03 '23 07:10 consideRatio