google.cloud icon indicating copy to clipboard operation
google.cloud copied to clipboard

[Bug] Compute instance always reports as changed

Open tcinbis opened this issue 4 years ago • 1 comments

SUMMARY

Creating a compute instance reports back as changed even if it was already created in a previous run. This results in a non idempotent behaviour which is usually not anticipated for Ansible modules unless mentioned otherwise in the documentation.

This was tested against master and version 1.0.1 of this collection.

The issue is related to #257 but that one was closed by the author without solving the root issue. Pinging @Rylon who was also involved in the previous issue.

ISSUE TYPE
  • Bug Report
COMPONENT NAME

gcp_compute_instance.py

ANSIBLE VERSION
ansible 2.10.2
  config file = /home/user/git-repo/policy/ansible.cfg
  configured module search path = ['/home/user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/user/git-repo/policy/venv/lib/python3.7/site-packages/ansible
  executable location = /home/user/git-repo/policy/venv/bin/ansible
  python version = 3.7.8 (default, Jun 29 2020, 05:44:46) [GCC 7.5.0]
CONFIGURATION
DEFAULT_HOST_LIST(/home/user/git-repo/policy/ansible.cfg) = ['/home/user/git-repo/policy/inventories']
DEFAULT_REMOTE_USER(/home/user/git-repo/policy/ansible.cfg) = ans
DEFAULT_ROLES_PATH(/home/user/git-repo/policy/ansible.cfg) = ['/home/user/.ansible/roles', '/usr/share/ansible/roles', '/etc/ansible/roles']
DEFAULT_VAULT_IDENTITY_LIST(/home/user/git-repo/policy/ansible.cfg) = ['[email protected]_pass.production', '[email protected]_pass.testing', '[email protected]_pass.development']
INTERPRETER_PYTHON(/home/user/git-repo/policy/ansible.cfg) = auto
INVENTORY_ENABLED(/home/user/git-repo/policy/ansible.cfg) = ['host_list', 'script', 'auto', 'yaml', 'ini', 'toml', 'gcp_compute']
OS / ENVIRONMENT

Running on Ubuntu 18.04.

STEPS TO REPRODUCE

Taken from the Ansible documentation with minor modifications.

#!ansible-playbook
---
- name: Create an instance
  hosts: localhost
  gather_facts: no
  vars:
      gcp_project: your-project
      gcp_cred_kind: application
      zone: "us-central1-a"
      region: "us-central1"

  tasks:
   - name: create a disk
     gcp_compute_disk:
         name: 'disk-instance'
         size_gb: 20
         source_image: 'projects/ubuntu-os-cloud/global/images/family/ubuntu-1604-lts'
         zone: "{{ zone }}"
         project: "{{ gcp_project }}"
         auth_kind: "{{ gcp_cred_kind }}"
         scopes:
           - https://www.googleapis.com/auth/compute
         state: present
     register: disk
   - name: create a address
     gcp_compute_address:
         name: 'address-instance'
         region: "{{ region }}"
         project: "{{ gcp_project }}"
         auth_kind: "{{ gcp_cred_kind }}"
         scopes:
           - https://www.googleapis.com/auth/compute
         state: present
     register: address
   - name: create a instance
     gcp_compute_instance:
         state: present
         name: test-vm
         machine_type: n1-standard-1
         disks:
           - auto_delete: true
             boot: true
             source: "{{ disk }}"
         network_interfaces:
             - network: null # use default
               access_configs:
                 - name: 'External NAT'
                   nat_ip: "{{ address }}"
                   type: 'ONE_TO_ONE_NAT'
         zone: "{{ zone }}"
         project: "{{ gcp_project }}"
         auth_kind: "{{ gcp_cred_kind }}"
         scopes:
           - https://www.googleapis.com/auth/compute
     register: instance

   - name: Wait for SSH to come up
     wait_for: host={{ address.address }} port=22 delay=10 timeout=60
EXPECTED RESULTS

In the first run all create tasks should be listed as changed in the log, but in a subsequent run these tasks should all report ok and not changed. This is the case for the modules gcp_compute_address and gcp_compute_disk but not for gcp_compute_instance.

Below you find the two hypotheical runs showing the expected behaviour.

First run:

./gcp-issue.yml

PLAY [Create an instance] *****************************************************************************************

TASK [create a disk] **********************************************************************************************
changed: [localhost]

TASK [create a address] *******************************************************************************************
changed: [localhost]

TASK [create a instance] ******************************************************************************************
changed: [localhost]

TASK [Wait for SSH to come up] ************************************************************************************
ok: [localhost]

PLAY RECAP ********************************************************************************************************
localhost                  : ok=4    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

Second run (right after without changing anything):

./gcp-issue.yml

PLAY [Create an instance] *****************************************************************************************

TASK [create a disk] **********************************************************************************************
ok: [localhost]

TASK [create a address] *******************************************************************************************
ok: [localhost]

TASK [create a instance] ******************************************************************************************
ok: [localhost]

TASK [Wait for SSH to come up] ************************************************************************************
ok: [localhost]

PLAY RECAP ********************************************************************************************************
localhost                  : ok=4    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0  

NOTE: The difference between the two runs should be that the task create instance is not marked as changed.

ACTUAL RESULTS

What happens instead is that the task create instance is always marked as changed even if the VM was just created in the previous run and no modifications were performed. See the log output with increased verbosity as a gist here.

Further digging into the gcp_compute_instance.py reveals that the is_different() method does not work properly. Or to be more precise, it correctly reports that the request and response are different on a syntax level, but they shouldn't really.

For example, the module uses this address prefix https://compute.googleapis.com/ for the machine type where the response uses https://www.googleapis.com/. The requested versus response values contained in their respective dictionaries (request_vals, response_vals) are listed below.

Requested values

{
    'disks': [{
        'autoDelete': True,
        'boot': True,
        'source': 'https://www.googleapis.com/compute/v1/projects/your-project/zones/us-central1-a/disks/disk-instance'
    }],
    'machineType': 'https://compute.googleapis.com/compute/v1/projects/your-project/zones/us-central1-a/machineTypes/n1-standard-1',
    'name': 'test-vm',
    'networkInterfaces': [{
        'accessConfigs': [{
            'name': 'External NAT',
            'natIP': '34.123.210.249',
            'type': 'ONE_TO_ONE_NAT'
        }]
    }]
}

Response values:

{
    'disks': [{
        'autoDelete': True,
        'boot': True,
        'source': 'https://www.googleapis.com/compute/v1/projects/your-project/zones/us-central1-a/disks/disk-instance'
    }],
    'machineType': 'https://www.googleapis.com/compute/v1/projects/your-project/zones/us-central1-a/machineTypes/n1-standard-1',
    'name': 'test-vm',
    'networkInterfaces': [{
        'accessConfigs': [{
            'name': 'External NAT',
            'natIP': '34.123.210.249',
            'type': 'ONE_TO_ONE_NAT',
            'networkTier': 'PREMIUM'
        }],
        'network': 'https://www.googleapis.com/compute/v1/projects/your-project/global/networks/default',
        'networkIP': '10.128.0.34',
        'subnetwork': 'https://www.googleapis.com/compute/v1/projects/your-project/regions/us-central1/subnetworks/default'
    }]
}

From here onwards I am not sure how to proceed, because one could update the module to ignore certain values or introduce equivalence mappings between certain values (e.g. for the address prefixes). I would appreciate some pointers into the right direction or a statement, whether this behaviour is anticipated as you are probably facing a similar behaviour internally as well.

Thanks!

tcinbis avatar Nov 12 '20 12:11 tcinbis

Every elephant as he grows, Learns to keep on his toes In his element as he goes, Bump Bump Bumpety Bump... BUMP

BUMP

Stolen from @neilmartin83

CC @rambleraptor

tcinbis avatar Dec 03 '20 07:12 tcinbis

2 years later this issue is still present ☹️

Semmu avatar Dec 11 '22 15:12 Semmu

2 years later this issue is still present frowning_face

Hello! Would you mind trying this with the 1.1.0-beta0 release?

There was a perma-diff in compute instance and several other resources previously, it should have been fixed with :https://github.com/ansible-collections/google.cloud/commit/0fc41bbda4f16fe73edffb08e51d9435262c7b47.

There's an integration test that passes for this as well.

toumorokoshi avatar Dec 12 '22 17:12 toumorokoshi

I'll close this for now as I can't reproduce, and there's a passing integration test to validate this.

Taking a look at the example, the diff is coming from the domain difference (www.googleapis.com vs compute.googleapis.com), which was precisely the bug fixed in the hash above.

Feel free to ping me if you do have a repro, even with the latest code.

toumorokoshi avatar Dec 16 '22 00:12 toumorokoshi

I have found two cases where this behavior appears:

  1. Image the following task sequence:
- name: Create VM
  gcp_compute_instance:
    ...
    machine_type: big-machine-type

- name: Downsize VM to save money
  shell: |
    gcloud <shutdown VM>
    gcloud <update-machine-type<
    gcloud <start VM>

If you run the above twice, the first task will return as changed, probably due to the mismatch in the machine type. This is not expected, as gcp_compute_instance does not update the machine type if it is different in reality compared to the ansible module invocation. I don't know which is the bug, the fact that it returns as changed, or the fact that machine_type does not get updated.

  1. If the network is specified like this:
- gcp_compute_instance:
    ...
    network_interfaces:
      - network:
          selfLink: global/networks/my-network
    ....

Then, the task will always return as changed. The above works fine. In order to make it not return as changed, the selfLink needs to be defined as the full url:

selfLink: https://www.googleapis.com/compute/v1/projects/en2720-2017/global/networks/my-network

The above cases are not covered by the tests.

nkakouros avatar Jan 11 '23 23:01 nkakouros