nixops-aws
nixops-aws copied to clipboard
Recover encrypted EBS volumes upon spot instance kill/termination
Currently, the LUKS passphrase for EBS volumes encryption is generated during the ec2 instance creation and deleted if we loose the machine or destroy it. It might be more appropriate to have the LUKS passphrase part of the EBS volume physical specification as well (created during the volume provisioning if an option encrypt is set to true) and that way it can be passed to the instance filesystem options to be used directly.
Example of the issue:
{ account ? "lb-dev"
, region ? "us-east-1"
, zone ? "us-east-1b"
, instanceType ? "r3.2xlarge"
, volumeSize ? 50
, rootSize ? 50
, spotInstancePrice ? 301
, description ? "Experiment with spot instances"
, ...
}:
{
network.description = description;
resources.ec2KeyPairs.keypair = { accessKeyId = account; inherit region; };
resources.iamRoles.role = { lib, ... }: {
accessKeyId = account;
policy = builtins.toJSON
{
Statement = [
{
Effect = "Allow";
Action = [ "sns:Publish" "sns:ListTopics"];
Resource = "*";
}
];
};
};
resources.ebsVolumes.data-vol = { resources, ... }:
{
inherit region zone;
accessKeyId = account;
size = resources.machines.machine.fileSystems."/home".ec2.size;
disk = resources.machines.machine.fileSystems."/home".ec2.disk;
};
machine = { resources, lib, pkgs, ... }:
{
deployment.targetEnv = "ec2";
deployment.ec2 = {
inherit region zone instanceType spotInstancePrice;
accessKeyId = account;
securityGroups = [ "admin" ];
keyPair = resources.ec2KeyPairs.keypair.name;
ebsInitialRootDiskSize = rootSize;
ebsOptimized = true;
instanceProfile = resources.iamRoles.role.name;
};
fileSystems."/home" = {
fsType = "ext4";
options = ["noatime" "nodiratime"];
device = "/dev/mapper/xvdf";
autoFormat = true;
ec2.disk = resources.ebsVolumes.data-vol;
ec2.size = volumeSize;
ec2.encrypt = true;
ec2.volumeType = "gp2";
};
environment.systemPackages = [ pkgs.awscli ];
};
}
After deploying the above network, to reproduce the issue, we can run aws ec2 terminate-instances --instance-ids <machine_id>
Then redeploying will trigger the following failure
machine.> A dependency job for local-fs.target failed. See 'journalctl -xe' for details.
machine.> the following new units were started: keys.target, nixops-keys.service, sshd-keygen.service
machine.> warning: the following units failed: cryptsetup-xvdf.service
machine.>
machine.> ● cryptsetup-xvdf.service - Cryptographic Setup of Device /dev/mapper/xvdf
machine.> Loaded: loaded (/nix/store/hm9lznk67dm7ky1x8zfx3vpbkfazw7jw-unit-cryptsetup-xvdf.service/cryptsetup-xvdf.service; bad; vendor preset: enabled)
machine.> Active: failed (Result: exit-code) since Sun 2016-12-04 22:47:33 UTC; 1min 27s ago
machine.> Process: 3581 ExecStart=/nix/store/22q816nwgzjq6dfasvkrs8hgfsfsfl3f-unit-script/bin/cryptsetup-xvdf-start (code=exited, status=2)
machine.> Main PID: 3581 (code=exited, status=2)
machine.>
machine.> Dec 04 22:47:31 machine systemd[1]: Starting Cryptographic Setup of Device /dev/mapper/xvdf...
machine.> Dec 04 22:47:33 machine cryptsetup-xvdf-start[3581]: No key available with this passphrase.
machine.> Dec 04 22:47:33 machine systemd[1]: cryptsetup-xvdf.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
machine.> Dec 04 22:47:33 machine systemd[1]: Failed to start Cryptographic Setup of Device /dev/mapper/xvdf.
machine.> Dec 04 22:47:33 machine systemd[1]: cryptsetup-xvdf.service: Unit entered failed state.
machine.> Dec 04 22:47:33 machine systemd[1]: cryptsetup-xvdf.service: Failed with result 'exit-code'.
machine.> error: unable to activate new configuration
error: activation of 1 of 1 machines failed (namely on ‘machine’)
Is this still the case, that there is no way to recover the passphrase if the machine is terminated?
@eqyiel I believe it is still the case, if you delete /terminate the deployment. unless you have a backup for the state file when the machine still up your key is gone. If the spot instance is lost on its own you can still find the key in the state file.
I think once we get https://github.com/NixOS/nixops/pull/1048 in master, we could have a workaround for this issue by generating the passphrase as an output resource and using the value for the device encryption.