Multiple builders with Ansible galaxy roles race
This issue was originally opened by @mbrancato as hashicorp/packer#8827. It was migrated here as a result of the Packer plugin split. The original body of the issue is below.
When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.
Overview of the Issue
When using multiple builders with the ansible provisioner and a galaxy_file, there seems to be a race condition where multiple builders may try to download the role at the same time. This results in an error like the following:
==> azure-arm-two: Provisioning with Ansible...
==> azure-arm-one: Connected to SSH!
==> azure-arm-one: Provisioning with Ansible...
azure-arm-one: Executing Ansible Galaxy
azure-arm-two: Executing Ansible Galaxy
azure-arm-one: - extracting <role name> to /root/.ansible/roles/<role name>
azure-arm-one: - <role name> (<hash>) was installed successfully
azure-arm-two: - extracting <role name> to /root/.ansible/roles/<role name>
azure-arm-two: [WARNING]: - <role name> was NOT installed successfully: the specified role
azure-arm-two: <role name> appears to already exist. Use --force to replace it.
azure-arm-two: ERROR! - you can use --ignore-errors to skip failed roles and finish processing the list.
==> azure-arm-two: Provisioning step had errors: Running the cleanup provisioner, if present...
Interestingly, I think this only happens if both processes seem to be extracting the role at the same time. If the builders are not near each other in the steps being performed, the second builder doesn't seem to care that the role already exists.
Note, the error is actually coming from Ansible. It seems like Packer would need to more tightly coordinate the use of ansible-galaxy between builders and not invoke the same request more than one at a time.
https://github.com/ansible/ansible/blob/c64202a49563fefb35bd8de59bceb0b3b2fa5fa1/lib/ansible/galaxy/role.py#L309
Reproduction Steps
Use two builders with an ansible galaxy file.
Packer version
Packer v1.5.4
Simplified Packer Buildfile
n/a
Operating system and Environment details
Linux, amd64
Log Fragments and crash.log files
n/a
Of note, this is particularly problematic and reproducible when using galaxy_force_install = true
Ugh, I started hitting this too when I made some changes to the project that caused my vsphere-clone and virtualbox-ovf builders to provision around the same time. Perhaps I'll stick in a "sleep" provisioner to delay one of the builders, like this:
provisioner "shell" {
only = ["vsphere-clone.dev-env"]
inline_shebang = "/bin/bash -l"
inline = [
"echo Delaying the vSphere provisioner a bit to prevent ansible-galaxy collisions.",
"echo This is a workaround for https://github.com/hashicorp/packer-plugin-ansible/issues/21...",
"sleep 30"
]
}
Assuming packer builder environments do NOT provide any sort of separation of ansible workspaces, it would seem that another workaround would be to separate the provisioners for each builder using only/except, and for each provide a different roles path and collections_path. In theory, this would inform the separate ansible processes (spawned by separate packer builder processes) to install roles and collections from ansible-galaxy into distinct folders. (It also tells the ansible processes where to look for roles/collections at runtime..)
Seems like that would work, but man what a mess to have to implement as a workaround. I'm all ears if someone else has ideas.
Yeah, I still have this issue. It tries to (forceably) install a role that is in use by another build. Omitting the galaxy_force will result in errors about a role already being present. I've solved this by adding the roles to the actual repository and not using ansible galaxy to install them