nixops icon indicating copy to clipboard operation
nixops copied to clipboard

systemd service failure inhibits system configuration activation

Open PAI5REECHO opened this issue 3 years ago • 2 comments

Whenever a nixops deployment is made on a system with a systemd service in a activating (auto-restart) or failed state the deployment fails. I don't understand why nixops is designed in this way though.

test.........> setting up tmpfiles
test.........> the following new units were started: [email protected]
test.........> warning: the following units failed: restic-backups-external.service
test.........> 
test.........> ● test.service - test
test.........>      Loaded: loaded (/etc/systemd/system/test.service; linked; preset: enabled)
test.........>      Active: activating (auto-restart) since Sun 2022-08-14 12:00:08 UTC; 2h 9min ago
test.........> TriggeredBy: ● test.timer
test.........>    Main PID: 8780 (code=exited, status=1/FAILURE)
test.........>         CPU: 512ms
test.........> error: Traceback (most recent call last):
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/deployment.py", line 906, in worker
    raise Exception(
Exception: unable to activate new configuration (exit code 4)

Traceback (most recent call last):
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/bin/.nixops-wrapped", line 9, in <module>
    sys.exit(main())
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/__main__.py", line 56, in main
    args.op(args)
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/script_defs.py", line 715, in op_deploy
    depl.deploy(
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/deployment.py", line 1365, in deploy
    self.run_with_notify("deploy", lambda: self._deploy(**kwargs))
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/deployment.py", line 1354, in run_with_notify
    f()
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/deployment.py", line 1365, in <lambda>
    self.run_with_notify("deploy", lambda: self._deploy(**kwargs))
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/deployment.py", line 1300, in _deploy
    self.activate_configs(
  File "/nix/store/8myxcs76bsyg37n21x2xwnj6srfwfxxm-python3.10-nixops-2.0.0/lib/python3.10/site-packages/nixops/deployment.py", line 947, in activate_configs
    raise Exception(
Exception: activation of 1 of 1 machines failed (namely on ‘test’)

PAI5REECHO avatar Aug 14 '22 18:08 PAI5REECHO

Me neither, if what you're saying is that something was skipped because of the error.

Stopping a deployment half way is incompatible with declarative deployments that do not specify dependencies (we don't) and it is also incompatible with the idea of letting the distributed system converge towards an acceptable (or fully) operational state. That said, using the deployment process for feedback about the system seems useful. Did your deployment skip anything because of the error? If so, that would be an issue that needs correcting.

Also we shouldn't be emitting a stack trace for this type of error and the log should be clear about what did and did not happen.

TODO

  • [ ] check that errors are collected but do not interrupt parallel changes
  • [ ] report such errors with clarity as to what happened. Specifically answer the question whether a re-deployment is necessary.
  • [ ] do not report a stack trace for expected errors that are handled properly

roberth avatar Aug 15 '22 14:08 roberth

Did your deployment skip anything because of the error?

Yes, the system activation fails due to a failing or pending systemd service, so no changes to the system are applied which is unexpected. Activation shouldn't depend on the health of systemd services.

PAI5REECHO avatar Aug 16 '22 06:08 PAI5REECHO