colmena
colmena copied to clipboard
Use nixops implementation of key services
Resurrecting #116. It looks, from testing, the systemd path units aren't actually reliable for triggering when files are created. So they'll work fine if you use them on keys that are configured to a destDir
such that the keys exist on boot, but they won't work consistently if you boot and then use colmena upload-keys
.
This PR replaces the systemd path based implementation with the proven-reliable implementation from nixops. I literally copy-pasted the unit configuration over from here.
I've confirmed on my own configs that this fixes all the problems I was having with the reliability of the key services. However, I haven't yet put together a minimal repro. I could work on doing that if you felt it was necessary @zhaofengli , but it will kind of be a pain and I am super confident in this change now so I'd prefer not to :).
Hey @zhaofengli ! Any objection to merging this? I'd love to move back to the main fork!
Hi, I'll take a look at this today. The original NixOps approach doesn't seem to interact well with uploadAt = "post-activation";
keys (activation stalls indefinitely waiting for the key service to start), so there will be some changes required on that front.
Edit: More delays :cry: I'll get to it this week.
@zhaofengli thanks a bunch! I actually think this is a good sign though; if the units for post-activation keys were activating even though the keys hadn't been uploaded, doesn't that illustrate exactly the problem I'm trying to fix (namely, that the unit files wouldn't reliably indicate that the keys had uploaded)?
So now we just need a way to configure the unit as non-blocking for activation. I don't think anything in my change actually configures any dependencies; perhaps there's something in the test case you're running that's blocking activation that has a dependsOn
or something? Can you show me exactly what the test case is here?
The test cases are here, which you can run with nix-build integration-tests -A apply
.
@zhaofengli I fixed the failing test in commit 33bca1c8c53164f8b62d08444650b5d1a582d5c9, but that test was not related to the post-activation keys so am I misunderstanding something? (to be clear in the new implementation the service isn't supposed to go inactive; it's "activating" while awaiting the key, and "active" when the key lands)
I was playing with the idea separately (pushed here) so to avoid looking/copying the LGPL code from NixOps directly, even though the amount of code is minimal. With that applied the apply
test was stuck during activation in the first colmena apply
command. I'll take a look later during the weekend.