securedrop
securedrop copied to clipboard
Develop and implement kernel update policy
Status quo
We currently build and distribute our own grsecurity-patched Linux kernels to get some extra security benefit. Most of the team is not particularly familiar or experienced with kernel development.
Kernel updates are done on an irregular schedule, typically whenever a security vulnerability is bad enough to get a name is when we do an update.
Outside input
At HOPE 2022, some of us saw a talk titled An Engineer's Guide to Linux Kernel UPgrades. I would recommend watching it, but if you don't have time the (IMO) most salient/applicable points were:
- The kernel has new stable releases every week, the vast majority of which have at least one security fix. Every time you miss a release the risk of regressions when you do actually upgrade increases.
- Don't try to look at the changelog to see if you need to upgrade - just upgrade.
Both of these make sense, if we upgraded or at least tested new kernel versions more frequently, the risk of hitting regressions would be lower since there are less changes being released. And our team does not have the experience to fully read through kernel changelogs and assess the security impact of each (hence only upgrading on brand name vulnerabilities).
Strawman proposal
We update to the latest kernel release once a month. This adds some regular frequency to updates, reducing our risk, but without being so frequent that it's a burden on the team.
To implement this without dramatically increasing the amount of manual labor needed by the team:
- Set up automation to automatically build new kernel releases and push them to apt-test
- #6328
- Document specifically the minimum testing needed to give a kernel the green light (is "does it boot" good enough? do we need to actually run SD and test a submission?)
- Set up automation to automatically test the new kernel against real hardware that we support
- #6508
- ...
Regressions haven't really been an issue within a particular kernel series. Typically the issue that does pop up is missing chipset support for new Intel generations. (We're usually a few minor versions behind the bleeding edge, and so, for example, ethernet driver support that vendors would typically backport into their kernels may not be in ours.) But it absolutely makes sense to update kernels more frequently across the server and workstation. Looking at your list, IMO the preferred order of priority for this work would be:
- #6328 - this makes pushing a new kernel a lot less labor-intensive and just makes sense
- #6508 - this is kindof a separate topic imo, but it would give instances on non-standard hardware the option to self-test
- Automation to test new kernels against supported hardware
- Automation to build kernels (this is iffy without reproducible builds, actually maybe that should be a separate point)
- Documentation on acceptance testing for kernels (we kindof already have that in the kernel testing part of the QA matrix)
Oooh, reproducible builds is a good point.
Mostly agreed with your priority list, except for the Documentation point, I think that should be towards the top. It already does mostly exist, I think it just needs to be explicitly spelled out if we're separating SD and kernel releases.
We've now done 2 updates through this system, which went pretty well IMO. I filed tasks for the two issues that didn't have them yet, so I think we can close this and follow-up on those independently.