foreman-ansible-modules icon indicating copy to clipboard operation
foreman-ansible-modules copied to clipboard

Server error due to stale (?) when creating multiple compute_profiles with VMware

Open parmstro opened this issue 1 year ago • 8 comments

SUMMARY

FAILED! => {"changed": false, "error": {"message": "undefined method `resource_pools' for nil:NilClass"}, "msg": "Failed to show resource: HTTPError: 500 Server Error: Internal Server Error for url: https://sat.example.ca/api/compute_resources/1"}

ISSUE TYPE
  • Bug Report
ANSIBLE VERSION
ansible --version
ansible [core 2.14.2]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/ansiblerunner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.11/site-packages/ansible
  ansible collection location = /home/ansiblerunner/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.11.2 (main, May 24 2023, 00:00:00) [GCC 11.3.1 20221121 (Red Hat 11.3.1-4)] (/usr/bin/python3.11)
  jinja version = 3.1.2
  libyaml = True

COLLECTION VERSION
Collection                  Version
--------------------------- -------
amazon.aws                  6.2.0  
ansible.controller          4.4.0  
ansible.netcommon           5.1.2  
ansible.posix               1.5.4  
ansible.utils               2.10.3 
azure.azcollection          1.16.0 
community.aws               6.1.0  
community.crypto            2.14.1 
community.general           7.1.0  
community.vmware            3.7.0  
containers.podman           1.10.2 
infra.ah_configuration      1.1.1  
redhat.redhat_csp_download  1.2.2  
redhat.rhel_idm             1.11.0 
redhat.rhel_system_roles    1.21.2 
redhat.satellite            3.12.0 
redhat.satellite_operations 1.3.0  

# /usr/share/ansible/collections/ansible_collections
Collection               Version
------------------------ -------
redhat.rhel_system_roles 1.21.1 

KATELLO/FOREMAN VERSION
foreman-3.5.1.19-1.el8sat.noarch
STEPS TO REPRODUCE
# var file
# compute_profiles
compute_profiles_mandatory:
  - name: "SOE_Small"
    compute_attributes:
      - compute_resource: "VMware_Lab"
        vm_attrs:
          cpus: 1
          corespersocket: 1
          memory_mb: 4096
          cluster: "NUCLab"
          # resource_pool: "Resources"
          path: "/Datacenters/example.ca/vm"
          guest_id: "rhel8_64Guest"
          hardware_version: "Default"
          memoryHotAddEnabled: true
          cpuHotAddEnabled: true
          add_cdrom: false
          boot_order:
            - "network"
            - "disk"
          scsi_controllers:
            - type: ParaVirtualSCSIController
              key: 1000
          volumes_attributes:
            0:
              thin: true
              name: "Hard disk"
              mode: "persistent"
              controller_key: 1000
              datastore: "NASAEX_VMS"
              size_gb: 65
          interfaces_attributes:
            0:
              type: "VirtualVmxnet3"
              network: "VM Network"

  - name: "SOE_Medium"
    compute_attributes:
      - compute_resource: "VMware_Lab"
        vm_attrs:
          cpus: 1
          corespersocket: 1
          memory_mb: 8192
          cluster: "NUCLab"
          # resource_pool: "Resources"
          path: "/Datacenters/example.ca/vm"
          guest_id: "rhel8_64Guest"
          hardware_version: "Default"
          memoryHotAddEnabled: true
          cpuHotAddEnabled: true
          add_cdrom: false
          boot_order:
            - "network"
            - "disk"
          scsi_controllers:
            - type: ParaVirtualSCSIController
              key: 1000
          volumes_attributes:
            0:
              thin: true
              name: "Hard disk"
              mode: "persistent"
              controller_key: 1000
              datastore: "NASAEX_VMS"
              size_gb: 100
          interfaces_attributes:
            0:
              type: "VirtualVmxnet3"
              network: "VM Network"

  - name: "SOE_Large"
    compute_attributes:
      - compute_resource: "VMware_Lab"
        vm_attrs:
          cpus: 2
          corespersocket: 1
          memory_mb: 16364
          cluster: "NUCLab"
          # resource_pool: "Resources"
          path: "/Datacenters/example.ca/vm"
          guest_id: "rhel8_64Guest"
          hardware_version: "Default"
          memoryHotAddEnabled: true
          cpuHotAddEnabled: true
          add_cdrom: false
          boot_order:
            - "network"
            - "disk"
          scsi_controllers:
            - type: ParaVirtualSCSIController
              key: 1000
          volumes_attributes:
            0:
              thin: true
              name: "Hard disk"
              mode: "persistent"
              controller_key: 1000
              datastore: "NASAEX_VMS"
              size_gb: 100
          interfaces_attributes:
            0:
              type: "VirtualVmxnet3"
              network: "VM Network"

# playbook
---
- name: "Test Task"
  hosts: sat.example.ca
  become: true
  gather_facts: true
  vars_files:
    - "whatever_you_name_the_var_file_above.yml"
    - "your_vault_file.yml"

  tasks:

  - name: "Test the specified task"
    ansible.builtin.include_tasks: roles/satellite_post/tasks/{{ test_task_name }}.yml

# task file - create_mandatory_compute_profiles.yml
---
- name: "Configure the mandatory compute profiles"
  include_tasks: ensure_compute_profile.yml
  loop: "{{ compute_profiles_mandatory }}"
  loop_control:
    loop_var: cpr
  when: "compute_profiles_mandatory is defined"


# task file - ensure_compute_profile.yml
---
- name: "Ensure the compute profile state - {{cpr.name}}"
  redhat.satellite.compute_profile:
    username: "{{ satellite_admin_username }}"
    password: "{{ satellite_admin_password }}"
    server_url: "{{ satellite_url }}"
    validate_certs: "{{ satellite_validate_certs }}"
    name: "{{ cpr.name }}"
    updated_name: "{{ cpr.updated_name | default(omit) }}"
    state: "{{ cpr.state | default(omit) }}"
    compute_attributes: "{{ cpr.compute_attributes | default(omit) }}"
EXPECTED RESULTS

No errors, all profiles created successfully.

ACTUAL RESULTS
2023-10-15 07:00:04,535 p=362451 u=ansiblerunner n=ansible | included: /home/ansiblerunner/development/ansible/labbuilder2/sat/roles/satellite_post/tasks/ensure_compute_profile.yml for sat.example.ca => (item={'name': 'SOE_Small', 'compute_attributes': [{'compute_resource': 'VMware_Lab', 'vm_attrs': {'cpus': 1, 'corespersocket': 1, 'memory_mb': 4096, 'cluster': 'NUCLab', 'resource_pool': 'Resources', 'path': '/Datacenters/example.ca/vm', 'guest_id': 'rhel8_64Guest', 'hardware_version': 'Default', 'memoryHotAddEnabled': True, 'cpuHotAddEnabled': True, 'add_cdrom': False, 'boot_order': ['network', 'disk'], 'scsi_controllers': [{'type': 'ParaVirtualSCSIController', 'key': 1000}], 'volumes_attributes': {0: {'thin': True, 'name': 'Hard disk', 'mode': 'persistent', 'controller_key': 1000, 'datastore': 'NASAEX_VMS', 'size_gb': 65}}, 'interfaces_attributes': {0: {'type': 'VirtualVmxnet3', 'network': 'VM Network'}}}}]})
2023-10-15 07:00:04,564 p=362451 u=ansiblerunner n=ansible | included: /home/ansiblerunner/development/ansible/labbuilder2/sat/roles/satellite_post/tasks/ensure_compute_profile.yml for sat.example.ca => (item={'name': 'SOE_Medium', 'compute_attributes': [{'compute_resource': 'VMware_Lab', 'vm_attrs': {'cpus': 1, 'corespersocket': 1, 'memory_mb': 8192, 'cluster': 'NUCLab', 'resource_pool': 'Resources', 'path': '/Datacenters/example.ca/vm', 'guest_id': 'rhel8_64Guest', 'hardware_version': 'Default', 'memoryHotAddEnabled': True, 'cpuHotAddEnabled': True, 'add_cdrom': False, 'boot_order': ['network', 'disk'], 'scsi_controllers': [{'type': 'ParaVirtualSCSIController', 'key': 1000}], 'volumes_attributes': {0: {'thin': True, 'name': 'Hard disk', 'mode': 'persistent', 'controller_key': 1000, 'datastore': 'NASAEX_VMS', 'size_gb': 100}}, 'interfaces_attributes': {0: {'type': 'VirtualVmxnet3', 'network': 'VM Network'}}}}]})
2023-10-15 07:00:04,591 p=362451 u=ansiblerunner n=ansible | included: /home/ansiblerunner/development/ansible/labbuilder2/sat/roles/satellite_post/tasks/ensure_compute_profile.yml for sat.example.ca => (item={'name': 'SOE_Large', 'compute_attributes': [{'compute_resource': 'VMware_Lab', 'vm_attrs': {'cpus': 4, 'corespersocket': 1, 'memory_mb': 8192, 'cluster': 'NUCLab', 'resource_pool': 'Resources', 'path': '/Datacenters/example.ca/vm', 'guest_id': 'rhel8_64Guest', 'hardware_version': 'Default', 'memoryHotAddEnabled': True, 'cpuHotAddEnabled': True, 'add_cdrom': False, 'boot_order': ['network', 'disk'], 'scsi_controllers': [{'type': 'ParaVirtualSCSIController', 'key': 1000}], 'volumes_attributes': {0: {'thin': True, 'name': 'Hard disk', 'mode': 'persistent', 'controller_key': 1000, 'datastore': 'NASAEX_VMS', 'size_gb': 100}}, 'interfaces_attributes': {0: {'type': 'VirtualVmxnet3', 'network': 'VM Network'}}}}]})
2023-10-15 07:00:06,608 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Ensure the compute profile state - SOE_Small] *************************************************************************************************************
2023-10-15 07:00:06,608 p=362451 u=ansiblerunner n=ansible | changed: [sat.example.ca]
2023-10-15 07:01:06,983 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Wait on API background refresh] ***************************************************************************************************************************
2023-10-15 07:01:06,983 p=362451 u=ansiblerunner n=ansible | ok: [sat.example.ca]
2023-10-15 07:01:08,215 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Ensure the compute profile state - SOE_Medium] ************************************************************************************************************
2023-10-15 07:01:08,216 p=362451 u=ansiblerunner n=ansible | changed: [sat.example.ca]
2023-10-15 07:02:08,586 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Wait on API background refresh] ***************************************************************************************************************************
2023-10-15 07:02:08,586 p=362451 u=ansiblerunner n=ansible | ok: [sat.example.ca]
2023-10-15 07:02:09,585 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Ensure the compute profile state - SOE_Large] *************************************************************************************************************
2023-10-15 07:02:09,585 p=362451 u=ansiblerunner n=ansible | fatal: [sat.example.ca]: FAILED! => {"changed": false, "error": {"message": "undefined method `resource_pools' for nil:NilClass"}, "msg": "Failed to show resource: HTTPError: 500 Server Error: Internal Server Error for url: https://sat.example.ca/api/compute_resources/1"}

NOTE: If I call the cache_refresh api before creating the profiles, all is good. However, I have run into the problem that trying to query the Satellite to get the info so I can call a cache refresh can also run into the error. This is a real bummer when you are an hour and a half in on a build and things bomb out. I am currently testing with the compute resource created with the cache off.

It would be nice to have a cache refresh embedded in the background as this is an automation task and not a UI thing. No one is waiting and watching.

parmstro avatar Oct 16 '23 20:10 parmstro

What is the cache refresh API?! :)

evgeni avatar Oct 16 '23 20:10 evgeni

- name: "Get the compute resource id"
  redhat.satellite.compute_resource:
    username: "{{ satellite_admin_username }}"
    password: "{{ satellite_admin_password }}"
    server_url: "{{ satellite_url }}"
    validate_certs: "{{ satellite_validate_certs }}"
    name: "{{ cpr.compute_attributes[0].compute_resource }}"
    state: "present"
  register: result

- ansible.builtin.set_fact:
    cr_id: "{{ result.entity.compute_resources[0].id }}"

# you can change the name of the compute profile by simply passing name and updated_name
- name: "Force refresh of Compute Resource API cache"
  ansible.builtin.uri:
    url: "{{ satellite_url }}/api/compute_resources/{{ cr_id }}-{{ cpr.compute_attributes[0].compute_resource }}/refresh_cache"
    method: "PUT"
    body_format: "json"
    user: "{{ satellite_admin_username }}"
    password: "{{ satellite_admin_password }}"
    force_basic_auth: true
    validate_certs: "{{ satellite_validate_certs }}"
  register: refresh_result

- debug:
    var: refresh_result.json.message   

parmstro avatar Oct 16 '23 20:10 parmstro

Wait, you create a fresh CR and after that the cache is invalid? That sounds like a Foreman bug, not something we should (have to) workaround in FAM.

Interestingly, https://github.com/theforeman/foreman/blob/develop/app/models/concerns/compute_resource_caching.rb only calls a refresh automatically after_update, but not after_create (or after_save which would contain both). The original refresh was added in https://projects.theforeman.org/issues/19506 / https://github.com/theforeman/foreman/pull/4524

Wonder what @ares thinks about this.

evgeni avatar Oct 17 '23 08:10 evgeni

That patch was create to only solve the issue of updating the CR. I'm not sure how the cache could be invalid right after the CR creation, but if it helps, I think replacing after_update with after_save is a good move. There was no higher logic in why we don't do that after the CR creation, it just felt unnecessary.

ares avatar Oct 17 '23 08:10 ares

Yeah, I am too curious how this ended up with a "bad" cache, but here we are.

@parmstro if the issue is sufficiently reproducible in your env, could you try patching it to use after_save instead of after_update and see if it makes anything better?

evgeni avatar Oct 17 '23 09:10 evgeni

Yes. I will patch to use after_save and set caching_enabled to true for my next test run. Please Note: with only caching_enabled, and no code to call cache_refresh, the builder creates the CR and gets through a couple of CPs then emits the error. This is very reproducible. with caching_enabled, and the code to call cache_refresh (code queries the API to get the ID of the compute resources so that we can use it in the call), this errors right after the call that creates the CR.

parmstro avatar Oct 17 '23 12:10 parmstro

Could you by any chance provide access to a reproducer system?

evgeni avatar Oct 17 '23 17:10 evgeni

The systems are built and torn down constantly. I would have to spin you up one, but that can be done. Let me see if I can work on it. I am in the middle of a test right now with caching_enabled: false .. The environment build is past the compute_resource creation and is actively using it to build systems. I am switching it back for the next run and creating the edit that you requested in comment 5. Currently building tang hosts, so environment should be finished in about 90 minutes or so.

parmstro avatar Oct 17 '23 18:10 parmstro