deploy-rs icon indicating copy to clipboard operation
deploy-rs copied to clipboard

Magic rollback gets confused with multiplexed SSH connections

Open brprice opened this issue 4 years ago • 4 comments

[ I'm not sure whether this should be considered a bug, or a PEBKAC. If the latter, it would be nice to document as a pitfall to avoid]

In my .ssh/config, I have set up automatic connection sharing via

Host *
        ControlMaster auto
        ControlPath ~/.ssh/%r@%h:%p

Thus, all ssh sessions to the same host go via the same connection. However, this means that the ssh session initiated by the rollback checker does not necessarily test whether ssh still works - it may just be multiplexed into an existing connection!

Consider the following (slightly contrived) scenario:

  • You are about to do a deploy which will break ssh so that new connections are rejected but existing ones carry on working (e.g. by disabling the ssh daemon)
  • Before doing so, you ssh <deploy-host> for some reason (to check disk usage, perhaps)
  • You leave that ssh session open, and in another terminal you run the deploy
  • The deploy's ssh sessions are multiplexed into the existing one. This doesn't matter for the copy and activate, but means that the rollback-check ssh erroneously succeeds (as it uses the still running existing connection is still up, and thus doesn't check whether a fresh connection will succeed)
  • Thus the deploy is activated and seems successful
  • You now close the manually-opened ssh connection ("disk usage checker" in this example)
  • Access is now lost to the remote box! Any new ssh <deploy-host> will now fail, as it won't accept the new connection.

I have seen this scenario happen when testing in a VM. I have not seen any issue without the manual ssh connection. (But since deploy-rs will ssh multiple times, I am slightly concerned there could be a race condition where the "activate" connection is not closed in time and thus is shared with the "rollback check" session resulting in the same issue. However, I have not checked the code - this concern may well be baseless)

The solution would be to add the -S none option to (the rollback-check's) ssh options, if we decide this is a bug we should fix, rather than user error.

brprice avatar Jul 23 '21 11:07 brprice

I made this https://github.com/2xsaiko/deploy-rs/commit/2dcbd4c1ae28fe8846594dd1d28fd5fcc3020cc8 to improve this issue, so that you have the option of using different SSH arguments and a different user for the SSH connection for the rollback check. This actually allows connection multiplexing to work correctly (I hope) instead of disabling it. (I actually wanted to get this to work correctly because of #54, I want to use a Yubikey as well)

Snippet from my configuration:

{
  deploy.nodes = optionalAttrs (hasAttr name network) {
    ${name} = {
      hostname = network.${name}.dnsName;
      sshUser = "root";
      user = "root";
      sshOpts = ["-i" "admin/id" "-S" "admin/%r@%h:%p" "-o" "ControlMaster=auto" "-o" "ControlPersist=60"];
      checkSshUser = "saiko";
      checkSshOpts = [];

      profiles.system = {
        path = deploy-rs.lib.${system}.activate.nixos self.nixosConfigurations.${name};
      };
    };
  };
}

Note the new checkSshUser and checkSshOpts options. I haven't submitted a PR yet though since I want to improve it a bit more.

2xsaiko avatar May 15 '22 22:05 2xsaiko

Is there a reason why sshOpts = ["-o" "ControlMaster=no"] isn't a good solution?

tecosaur avatar Sep 19 '22 05:09 tecosaur

Is there a reason why sshOpts = ["-o" "ControlMaster=no"] isn't a good solution?

That certainly works, and I would be happy if we decided that was the recommended solution and documented this pitfall to avoid.

Personally I'd prefer to automatically disable multiplexing for magic rollback, since I don't see any situation where one would want to multiplex that connection, and it would be nice (as a user) to not have to be aware that I should add that to every deploy-rs config (or not enable multiplexing in .ssh/config globally).

brprice avatar Oct 17 '22 14:10 brprice

Is there a reason why sshOpts = ["-o" "ControlMaster=no"] isn't a good solution?

Actually, on second thought, I think this should be ControlPath=none.

brprice avatar Jul 01 '23 10:07 brprice