colmena icon indicating copy to clipboard operation
colmena copied to clipboard

Use nixops implementation of key services

Open cprussin opened this issue 1 year ago • 6 comments

Resurrecting #116. It looks, from testing, the systemd path units aren't actually reliable for triggering when files are created. So they'll work fine if you use them on keys that are configured to a destDir such that the keys exist on boot, but they won't work consistently if you boot and then use colmena upload-keys.

This PR replaces the systemd path based implementation with the proven-reliable implementation from nixops. I literally copy-pasted the unit configuration over from here.

I've confirmed on my own configs that this fixes all the problems I was having with the reliability of the key services. However, I haven't yet put together a minimal repro. I could work on doing that if you felt it was necessary @zhaofengli , but it will kind of be a pain and I am super confident in this change now so I'd prefer not to :).

cprussin avatar Aug 20 '22 06:08 cprussin

Hey @zhaofengli ! Any objection to merging this? I'd love to move back to the main fork!

cprussin avatar Sep 10 '22 03:09 cprussin

Hi, I'll take a look at this today. The original NixOps approach doesn't seem to interact well with uploadAt = "post-activation"; keys (activation stalls indefinitely waiting for the key service to start), so there will be some changes required on that front.

Edit: More delays :cry: I'll get to it this week.

zhaofengli avatar Sep 13 '22 19:09 zhaofengli

@zhaofengli thanks a bunch! I actually think this is a good sign though; if the units for post-activation keys were activating even though the keys hadn't been uploaded, doesn't that illustrate exactly the problem I'm trying to fix (namely, that the unit files wouldn't reliably indicate that the keys had uploaded)?

So now we just need a way to configure the unit as non-blocking for activation. I don't think anything in my change actually configures any dependencies; perhaps there's something in the test case you're running that's blocking activation that has a dependsOn or something? Can you show me exactly what the test case is here?

cprussin avatar Sep 16 '22 00:09 cprussin

The test cases are here, which you can run with nix-build integration-tests -A apply.

zhaofengli avatar Sep 16 '22 00:09 zhaofengli

@zhaofengli I fixed the failing test in commit 33bca1c8c53164f8b62d08444650b5d1a582d5c9, but that test was not related to the post-activation keys so am I misunderstanding something? (to be clear in the new implementation the service isn't supposed to go inactive; it's "activating" while awaiting the key, and "active" when the key lands)

cprussin avatar Sep 16 '22 03:09 cprussin

I was playing with the idea separately (pushed here) so to avoid looking/copying the LGPL code from NixOps directly, even though the amount of code is minimal. With that applied the apply test was stuck during activation in the first colmena apply command. I'll take a look later during the weekend.

zhaofengli avatar Sep 16 '22 07:09 zhaofengli