securedrop icon indicating copy to clipboard operation
securedrop copied to clipboard

Molecule scenarios fail with duplicate DHCP leases

Open cfm opened this issue 3 years ago • 4 comments

Description

In testing #6493, the app and mon VMs created by molecule create -s libvirt-{staging,prod}-focal are given the same DHCP lease, which causes one or the other VM to be inaccessible and molecule converge to fail.

Steps to Reproduce

$ make staging

Expected Behavior

# journalctl -f
[...]
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPDISCOVER(virbr1) 52:54:00:99:ac:6c
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPOFFER(virbr1) 192.168.121.142 52:54:00:99:ac:6c
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPREQUEST(virbr1) 192.168.121.142 52:54:00:99:ac:6c
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPACK(virbr1) 192.168.121.142 52:54:00:99:ac:6c app-staging
[...]
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPDISCOVER(virbr1) 52:54:00:e8:72:f2
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPOFFER(virbr1) 192.168.121.253 52:54:00:e8:72:f2
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPREQUEST(virbr1) 192.168.121.253 52:54:00:e8:72:f2
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPACK(virbr1) 192.168.121.253 52:54:00:e8:72:f2 mon-staging
[...]

Actual Behavior

# journalctl -f
[...]
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPDISCOVER(virbr1) 52:54:00:99:ac:6c
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPOFFER(virbr1) 192.168.121.142 52:54:00:99:ac:6c
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPREQUEST(virbr1) 192.168.121.142 52:54:00:99:ac:6c
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPACK(virbr1) 192.168.121.142 52:54:00:99:ac:6c app-staging
[...]
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPDISCOVER(virbr1) 52:54:00:e8:72:f2
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPOFFER(virbr1) 192.168.121.142 52:54:00:e8:72:f2
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPREQUEST(virbr1) 192.168.121.142 52:54:00:e8:72:f2
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPACK(virbr1) 192.168.121.142 52:54:00:e8:72:f2 mon-staging
[...]

Comments

Appears to have started since the v202206.03.0 release of the bento/ubuntu-20.04 Vagrant box and goes away by pinning to:

$ vagrant box add --provider virtualbox bento/ubuntu-20.04 --box-version 202112.19.0

cfm avatar Jul 19 '22 23:07 cfm

A few ephemeral test runs ago I discovered that using bento/ubuntu-20.04 v202206.03.0 both app and mon had identical /etc/machine-ids, which would make this not only expected but correct behavior as far as DHCP is concerned. I'll come back to this hypothesis after #6493.

cfm avatar Jul 20 '22 00:07 cfm

Reported upstream in chef/bento#1421. Workaround documented in freedomofpress/securedrop-docs#364.

cfm avatar Jul 20 '22 19:07 cfm

Could we nuke/regenerate /etc/machine-id on the VM as part of provisioning?

zenmonkeykstop avatar Jul 20 '22 19:07 zenmonkeykstop

Not with our existing Ansible playbooks. DHCP actually succeeds for both VMs, but the duplicate lease leads to undefined behavior like the following, which is what I tripped over in discovering this bug. Compare hostnames and IP addresses:

What Ansible means to do What Ansible actually does What happens
ssh app "apt-get update" ssh 192.168.121.142 "apt-get update"
ssh mon "apt-get update" ssh 192.168.121.142 "apt-get update" ✗, apt-get lock contention

So Ansible and anything else SSH-based is out. It looks like we could probably add a vm.provision :shell item to the instance_raw_config_args passed to Vagrant. But since this can be worked around by adding an argument to a shell command that's already entered manually, by developers only, I thought we could start there, especially since the upstream is likely to fix what seems to be a regression.

cfm avatar Jul 20 '22 19:07 cfm