smee icon indicating copy to clipboard operation
smee copied to clipboard

iPXE fails to boot when using the latest LetsEncrypt trust chains

Open nshalman opened this issue 3 years ago • 6 comments

Any HTTPS server using LetsEncrypt might suddenly stop working with our iPXE binaries after the next certificate refresh.

Expected Behaviour

iPXE boots normally

Current Behaviour

Invalid argument (http://ipxe.org/1c0de802)

Context

  • @grahamc's boot server refreshed its LE certificate and iPXE booting in @equinixmetal datacenters started failing.
  • https://community.letsencrypt.org/t/production-chain-changes/150739
  • ipxe/ipxe#116

nshalman avatar Jun 16 '21 20:06 nshalman

With NixOS 21.05 you can work around this by preferring their shorter chain:

   security.acme = {
     acceptTerms = true;
     email = "[email protected]";
-    certs."${domain}".keyType = "rsa4096";
+    certs."${domain}" = {
+      keyType = "rsa4096";
+      extraLegoRunFlags = [
+        # re: https://community.letsencrypt.org/t/production-chain-changes/150739/1
+        # re: https://github.com/ipxe/ipxe/pull/116
+        # re: https://github.com/ipxe/ipxe/pull/112
+        # re: https://lists.ipxe.org/pipermail/ipxe-devel/2020-May/007042.html
+        "--preferred-chain" "ISRG Root X1"
+      ];
+    };
   };

grahamc avatar Jun 16 '21 20:06 grahamc

I noticed that https://github.com/ipxe/ipxe/pull/116 was merged. I believe this should be fixable by simply upgrading which version of ipxe we use.

tstromberg avatar Aug 27 '21 03:08 tstromberg

I noticed that ipxe/ipxe#116 was merged. I believe this should be fixable by simply upgrading which version of ipxe we use.

Uh, am I missing something? It doesn't look merged to me...

nshalman avatar Aug 27 '21 18:08 nshalman

Should we just apply the patch ourselves? How would you like to proceed @nshalman?

mmlb avatar Oct 19 '21 15:10 mmlb

I'm not sure. I think we've worked around this in our production environment by preferring the alternate chain. I think we can wait-and-see a little longer in case upstream finally merges that change and then we can just advance to the latest.

Also, given #213 and a desire to re-work how we handle our iPXE builds, perhaps we should defer any changes for this to when we have a broken-out iPXE build...

nshalman avatar Oct 19 '21 16:10 nshalman

I spent some time looking at the current situation for this issue, looking especially for projects that have successsfully worked around this, projects which are carrying the PR 116 patch, and organizations that are blocked for whatever reason.

workarounds

  • Flatcar uses the "ISRG Root X1" chain, per https://github.com/flatcar-linux/Flatcar/issues/527
  • NixOS uses the "ISRG Root X1" chain, as above. https://github.com/tinkerbell/boots/issues/166#issuecomment-862704961

carrying the patch

  • Arch Linux distro incorporated the TLS Fragmentation patch, noted at https://github.com/archlinux/svntogit-community/commit/3daaf0532e3457222997fbdfeb37fbd12c5bf237
  • netboot.xyz has a testing fork for the PR 116 patch, at https://github.com/netbootxyz/ipxe/tree/testing_upstream_pr116 noted at https://github.com/netbootxyz/netboot.xyz/pull/920 - unclear to me if this is in production

blocked

  • Harvester has an open issue at https://github.com/harvester/harvester/issues/2226 and documentation with some workarounds at https://github.com/harvester/ipxe-examples/tree/main/equinix
  • XCP-ng (verbal report, no public issue as of yet)

vielmetti avatar Jul 01 '22 14:07 vielmetti