Molecule scenarios fail with duplicate DHCP leases
Description
In testing #6493, the app and mon VMs created by molecule create -s libvirt-{staging,prod}-focal are given the same DHCP lease, which causes one or the other VM to be inaccessible and molecule converge to fail.
Steps to Reproduce
$ make staging
Expected Behavior
# journalctl -f
[...]
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPDISCOVER(virbr1) 52:54:00:99:ac:6c
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPOFFER(virbr1) 192.168.121.142 52:54:00:99:ac:6c
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPREQUEST(virbr1) 192.168.121.142 52:54:00:99:ac:6c
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPACK(virbr1) 192.168.121.142 52:54:00:99:ac:6c app-staging
[...]
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPDISCOVER(virbr1) 52:54:00:e8:72:f2
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPOFFER(virbr1) 192.168.121.253 52:54:00:e8:72:f2
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPREQUEST(virbr1) 192.168.121.253 52:54:00:e8:72:f2
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPACK(virbr1) 192.168.121.253 52:54:00:e8:72:f2 mon-staging
[...]
Actual Behavior
# journalctl -f
[...]
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPDISCOVER(virbr1) 52:54:00:99:ac:6c
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPOFFER(virbr1) 192.168.121.142 52:54:00:99:ac:6c
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPREQUEST(virbr1) 192.168.121.142 52:54:00:99:ac:6c
Jul 19 23:49:33 sd-staging dnsmasq-dhcp[39062]: DHCPACK(virbr1) 192.168.121.142 52:54:00:99:ac:6c app-staging
[...]
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPDISCOVER(virbr1) 52:54:00:e8:72:f2
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPOFFER(virbr1) 192.168.121.142 52:54:00:e8:72:f2
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPREQUEST(virbr1) 192.168.121.142 52:54:00:e8:72:f2
Jul 19 23:49:49 sd-staging dnsmasq-dhcp[39062]: DHCPACK(virbr1) 192.168.121.142 52:54:00:e8:72:f2 mon-staging
[...]
Comments
Appears to have started since the v202206.03.0 release of the bento/ubuntu-20.04 Vagrant box and goes away by pinning to:
$ vagrant box add --provider virtualbox bento/ubuntu-20.04 --box-version 202112.19.0
A few ephemeral test runs ago I discovered that using bento/ubuntu-20.04 v202206.03.0 both app and mon had identical /etc/machine-ids, which would make this not only expected but correct behavior as far as DHCP is concerned. I'll come back to this hypothesis after #6493.
Reported upstream in chef/bento#1421. Workaround documented in freedomofpress/securedrop-docs#364.
Could we nuke/regenerate /etc/machine-id on the VM as part of provisioning?
Not with our existing Ansible playbooks. DHCP actually succeeds for both VMs, but the duplicate lease leads to undefined behavior like the following, which is what I tripped over in discovering this bug. Compare hostnames and IP addresses:
| What Ansible means to do | What Ansible actually does | What happens |
|---|---|---|
ssh app "apt-get update" |
ssh 192.168.121.142 "apt-get update" |
✓ |
ssh mon "apt-get update" |
ssh 192.168.121.142 "apt-get update" |
✗, apt-get lock contention |
So Ansible and anything else SSH-based is out. It looks like we could probably add a vm.provision :shell item to the instance_raw_config_args passed to Vagrant. But since this can be worked around by adding an argument to a shell command that's already entered manually, by developers only, I thought we could start there, especially since the upstream is likely to fix what seems to be a regression.