packer-plugin-ansible icon indicating copy to clipboard operation
packer-plugin-ansible copied to clipboard

Multiple builders with Ansible galaxy roles race

Open ghost opened this issue 5 years ago • 3 comments

This issue was originally opened by @mbrancato as hashicorp/packer#8827. It was migrated here as a result of the Packer plugin split. The original body of the issue is below.


When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.

Overview of the Issue

When using multiple builders with the ansible provisioner and a galaxy_file, there seems to be a race condition where multiple builders may try to download the role at the same time. This results in an error like the following:

==> azure-arm-two: Provisioning with Ansible...
==> azure-arm-one: Connected to SSH!
==> azure-arm-one: Provisioning with Ansible...
    azure-arm-one: Executing Ansible Galaxy
    azure-arm-two: Executing Ansible Galaxy
    azure-arm-one: - extracting <role name> to /root/.ansible/roles/<role name>
    azure-arm-one: - <role name> (<hash>) was installed successfully
    azure-arm-two: - extracting <role name> to /root/.ansible/roles/<role name>
    azure-arm-two: [WARNING]: - <role name> was NOT installed successfully: the specified role
    azure-arm-two: <role name> appears to already exist. Use --force to replace it.
    azure-arm-two: ERROR! - you can use --ignore-errors to skip failed roles and finish processing the list.
==> azure-arm-two: Provisioning step had errors: Running the cleanup provisioner, if present...

Interestingly, I think this only happens if both processes seem to be extracting the role at the same time. If the builders are not near each other in the steps being performed, the second builder doesn't seem to care that the role already exists.

Note, the error is actually coming from Ansible. It seems like Packer would need to more tightly coordinate the use of ansible-galaxy between builders and not invoke the same request more than one at a time. https://github.com/ansible/ansible/blob/c64202a49563fefb35bd8de59bceb0b3b2fa5fa1/lib/ansible/galaxy/role.py#L309

Reproduction Steps

Use two builders with an ansible galaxy file.

Packer version

Packer v1.5.4

Simplified Packer Buildfile

n/a

Operating system and Environment details

Linux, amd64

Log Fragments and crash.log files

n/a

ghost avatar Apr 16 '21 18:04 ghost

Of note, this is particularly problematic and reproducible when using galaxy_force_install = true

mbainter avatar Apr 21 '21 22:04 mbainter

Ugh, I started hitting this too when I made some changes to the project that caused my vsphere-clone and virtualbox-ovf builders to provision around the same time. Perhaps I'll stick in a "sleep" provisioner to delay one of the builders, like this:

provisioner "shell" {
    only = ["vsphere-clone.dev-env"]
    inline_shebang = "/bin/bash -l"
    inline = [
      "echo Delaying the vSphere provisioner a bit to prevent ansible-galaxy collisions.",
      "echo This is a workaround for https://github.com/hashicorp/packer-plugin-ansible/issues/21...",
      "sleep 30"
    ]
  }

Assuming packer builder environments do NOT provide any sort of separation of ansible workspaces, it would seem that another workaround would be to separate the provisioners for each builder using only/except, and for each provide a different roles path and collections_path. In theory, this would inform the separate ansible processes (spawned by separate packer builder processes) to install roles and collections from ansible-galaxy into distinct folders. (It also tells the ansible processes where to look for roles/collections at runtime..)

Seems like that would work, but man what a mess to have to implement as a workaround. I'm all ears if someone else has ideas.

timblaktu avatar May 04 '21 21:05 timblaktu

Yeah, I still have this issue. It tries to (forceably) install a role that is in use by another build. Omitting the galaxy_force will result in errors about a role already being present. I've solved this by adding the roles to the actual repository and not using ansible galaxy to install them

sdejong629 avatar Nov 28 '23 10:11 sdejong629