foreman-ansible-modules
foreman-ansible-modules copied to clipboard
Server error due to stale (?) when creating multiple compute_profiles with VMware
SUMMARY
FAILED! => {"changed": false, "error": {"message": "undefined method `resource_pools' for nil:NilClass"}, "msg": "Failed to show resource: HTTPError: 500 Server Error: Internal Server Error for url: https://sat.example.ca/api/compute_resources/1"}
ISSUE TYPE
- Bug Report
ANSIBLE VERSION
ansible --version
ansible [core 2.14.2]
config file = /etc/ansible/ansible.cfg
configured module search path = ['/home/ansiblerunner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.11/site-packages/ansible
ansible collection location = /home/ansiblerunner/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.11.2 (main, May 24 2023, 00:00:00) [GCC 11.3.1 20221121 (Red Hat 11.3.1-4)] (/usr/bin/python3.11)
jinja version = 3.1.2
libyaml = True
COLLECTION VERSION
Collection Version
--------------------------- -------
amazon.aws 6.2.0
ansible.controller 4.4.0
ansible.netcommon 5.1.2
ansible.posix 1.5.4
ansible.utils 2.10.3
azure.azcollection 1.16.0
community.aws 6.1.0
community.crypto 2.14.1
community.general 7.1.0
community.vmware 3.7.0
containers.podman 1.10.2
infra.ah_configuration 1.1.1
redhat.redhat_csp_download 1.2.2
redhat.rhel_idm 1.11.0
redhat.rhel_system_roles 1.21.2
redhat.satellite 3.12.0
redhat.satellite_operations 1.3.0
# /usr/share/ansible/collections/ansible_collections
Collection Version
------------------------ -------
redhat.rhel_system_roles 1.21.1
KATELLO/FOREMAN VERSION
foreman-3.5.1.19-1.el8sat.noarch
STEPS TO REPRODUCE
# var file
# compute_profiles
compute_profiles_mandatory:
- name: "SOE_Small"
compute_attributes:
- compute_resource: "VMware_Lab"
vm_attrs:
cpus: 1
corespersocket: 1
memory_mb: 4096
cluster: "NUCLab"
# resource_pool: "Resources"
path: "/Datacenters/example.ca/vm"
guest_id: "rhel8_64Guest"
hardware_version: "Default"
memoryHotAddEnabled: true
cpuHotAddEnabled: true
add_cdrom: false
boot_order:
- "network"
- "disk"
scsi_controllers:
- type: ParaVirtualSCSIController
key: 1000
volumes_attributes:
0:
thin: true
name: "Hard disk"
mode: "persistent"
controller_key: 1000
datastore: "NASAEX_VMS"
size_gb: 65
interfaces_attributes:
0:
type: "VirtualVmxnet3"
network: "VM Network"
- name: "SOE_Medium"
compute_attributes:
- compute_resource: "VMware_Lab"
vm_attrs:
cpus: 1
corespersocket: 1
memory_mb: 8192
cluster: "NUCLab"
# resource_pool: "Resources"
path: "/Datacenters/example.ca/vm"
guest_id: "rhel8_64Guest"
hardware_version: "Default"
memoryHotAddEnabled: true
cpuHotAddEnabled: true
add_cdrom: false
boot_order:
- "network"
- "disk"
scsi_controllers:
- type: ParaVirtualSCSIController
key: 1000
volumes_attributes:
0:
thin: true
name: "Hard disk"
mode: "persistent"
controller_key: 1000
datastore: "NASAEX_VMS"
size_gb: 100
interfaces_attributes:
0:
type: "VirtualVmxnet3"
network: "VM Network"
- name: "SOE_Large"
compute_attributes:
- compute_resource: "VMware_Lab"
vm_attrs:
cpus: 2
corespersocket: 1
memory_mb: 16364
cluster: "NUCLab"
# resource_pool: "Resources"
path: "/Datacenters/example.ca/vm"
guest_id: "rhel8_64Guest"
hardware_version: "Default"
memoryHotAddEnabled: true
cpuHotAddEnabled: true
add_cdrom: false
boot_order:
- "network"
- "disk"
scsi_controllers:
- type: ParaVirtualSCSIController
key: 1000
volumes_attributes:
0:
thin: true
name: "Hard disk"
mode: "persistent"
controller_key: 1000
datastore: "NASAEX_VMS"
size_gb: 100
interfaces_attributes:
0:
type: "VirtualVmxnet3"
network: "VM Network"
# playbook
---
- name: "Test Task"
hosts: sat.example.ca
become: true
gather_facts: true
vars_files:
- "whatever_you_name_the_var_file_above.yml"
- "your_vault_file.yml"
tasks:
- name: "Test the specified task"
ansible.builtin.include_tasks: roles/satellite_post/tasks/{{ test_task_name }}.yml
# task file - create_mandatory_compute_profiles.yml
---
- name: "Configure the mandatory compute profiles"
include_tasks: ensure_compute_profile.yml
loop: "{{ compute_profiles_mandatory }}"
loop_control:
loop_var: cpr
when: "compute_profiles_mandatory is defined"
# task file - ensure_compute_profile.yml
---
- name: "Ensure the compute profile state - {{cpr.name}}"
redhat.satellite.compute_profile:
username: "{{ satellite_admin_username }}"
password: "{{ satellite_admin_password }}"
server_url: "{{ satellite_url }}"
validate_certs: "{{ satellite_validate_certs }}"
name: "{{ cpr.name }}"
updated_name: "{{ cpr.updated_name | default(omit) }}"
state: "{{ cpr.state | default(omit) }}"
compute_attributes: "{{ cpr.compute_attributes | default(omit) }}"
EXPECTED RESULTS
No errors, all profiles created successfully.
ACTUAL RESULTS
2023-10-15 07:00:04,535 p=362451 u=ansiblerunner n=ansible | included: /home/ansiblerunner/development/ansible/labbuilder2/sat/roles/satellite_post/tasks/ensure_compute_profile.yml for sat.example.ca => (item={'name': 'SOE_Small', 'compute_attributes': [{'compute_resource': 'VMware_Lab', 'vm_attrs': {'cpus': 1, 'corespersocket': 1, 'memory_mb': 4096, 'cluster': 'NUCLab', 'resource_pool': 'Resources', 'path': '/Datacenters/example.ca/vm', 'guest_id': 'rhel8_64Guest', 'hardware_version': 'Default', 'memoryHotAddEnabled': True, 'cpuHotAddEnabled': True, 'add_cdrom': False, 'boot_order': ['network', 'disk'], 'scsi_controllers': [{'type': 'ParaVirtualSCSIController', 'key': 1000}], 'volumes_attributes': {0: {'thin': True, 'name': 'Hard disk', 'mode': 'persistent', 'controller_key': 1000, 'datastore': 'NASAEX_VMS', 'size_gb': 65}}, 'interfaces_attributes': {0: {'type': 'VirtualVmxnet3', 'network': 'VM Network'}}}}]})
2023-10-15 07:00:04,564 p=362451 u=ansiblerunner n=ansible | included: /home/ansiblerunner/development/ansible/labbuilder2/sat/roles/satellite_post/tasks/ensure_compute_profile.yml for sat.example.ca => (item={'name': 'SOE_Medium', 'compute_attributes': [{'compute_resource': 'VMware_Lab', 'vm_attrs': {'cpus': 1, 'corespersocket': 1, 'memory_mb': 8192, 'cluster': 'NUCLab', 'resource_pool': 'Resources', 'path': '/Datacenters/example.ca/vm', 'guest_id': 'rhel8_64Guest', 'hardware_version': 'Default', 'memoryHotAddEnabled': True, 'cpuHotAddEnabled': True, 'add_cdrom': False, 'boot_order': ['network', 'disk'], 'scsi_controllers': [{'type': 'ParaVirtualSCSIController', 'key': 1000}], 'volumes_attributes': {0: {'thin': True, 'name': 'Hard disk', 'mode': 'persistent', 'controller_key': 1000, 'datastore': 'NASAEX_VMS', 'size_gb': 100}}, 'interfaces_attributes': {0: {'type': 'VirtualVmxnet3', 'network': 'VM Network'}}}}]})
2023-10-15 07:00:04,591 p=362451 u=ansiblerunner n=ansible | included: /home/ansiblerunner/development/ansible/labbuilder2/sat/roles/satellite_post/tasks/ensure_compute_profile.yml for sat.example.ca => (item={'name': 'SOE_Large', 'compute_attributes': [{'compute_resource': 'VMware_Lab', 'vm_attrs': {'cpus': 4, 'corespersocket': 1, 'memory_mb': 8192, 'cluster': 'NUCLab', 'resource_pool': 'Resources', 'path': '/Datacenters/example.ca/vm', 'guest_id': 'rhel8_64Guest', 'hardware_version': 'Default', 'memoryHotAddEnabled': True, 'cpuHotAddEnabled': True, 'add_cdrom': False, 'boot_order': ['network', 'disk'], 'scsi_controllers': [{'type': 'ParaVirtualSCSIController', 'key': 1000}], 'volumes_attributes': {0: {'thin': True, 'name': 'Hard disk', 'mode': 'persistent', 'controller_key': 1000, 'datastore': 'NASAEX_VMS', 'size_gb': 100}}, 'interfaces_attributes': {0: {'type': 'VirtualVmxnet3', 'network': 'VM Network'}}}}]})
2023-10-15 07:00:06,608 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Ensure the compute profile state - SOE_Small] *************************************************************************************************************
2023-10-15 07:00:06,608 p=362451 u=ansiblerunner n=ansible | changed: [sat.example.ca]
2023-10-15 07:01:06,983 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Wait on API background refresh] ***************************************************************************************************************************
2023-10-15 07:01:06,983 p=362451 u=ansiblerunner n=ansible | ok: [sat.example.ca]
2023-10-15 07:01:08,215 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Ensure the compute profile state - SOE_Medium] ************************************************************************************************************
2023-10-15 07:01:08,216 p=362451 u=ansiblerunner n=ansible | changed: [sat.example.ca]
2023-10-15 07:02:08,586 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Wait on API background refresh] ***************************************************************************************************************************
2023-10-15 07:02:08,586 p=362451 u=ansiblerunner n=ansible | ok: [sat.example.ca]
2023-10-15 07:02:09,585 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Ensure the compute profile state - SOE_Large] *************************************************************************************************************
2023-10-15 07:02:09,585 p=362451 u=ansiblerunner n=ansible | fatal: [sat.example.ca]: FAILED! => {"changed": false, "error": {"message": "undefined method `resource_pools' for nil:NilClass"}, "msg": "Failed to show resource: HTTPError: 500 Server Error: Internal Server Error for url: https://sat.example.ca/api/compute_resources/1"}
NOTE: If I call the cache_refresh api before creating the profiles, all is good. However, I have run into the problem that trying to query the Satellite to get the info so I can call a cache refresh can also run into the error. This is a real bummer when you are an hour and a half in on a build and things bomb out. I am currently testing with the compute resource created with the cache off.
It would be nice to have a cache refresh embedded in the background as this is an automation task and not a UI thing. No one is waiting and watching.
What is the cache refresh API?! :)
- name: "Get the compute resource id"
redhat.satellite.compute_resource:
username: "{{ satellite_admin_username }}"
password: "{{ satellite_admin_password }}"
server_url: "{{ satellite_url }}"
validate_certs: "{{ satellite_validate_certs }}"
name: "{{ cpr.compute_attributes[0].compute_resource }}"
state: "present"
register: result
- ansible.builtin.set_fact:
cr_id: "{{ result.entity.compute_resources[0].id }}"
# you can change the name of the compute profile by simply passing name and updated_name
- name: "Force refresh of Compute Resource API cache"
ansible.builtin.uri:
url: "{{ satellite_url }}/api/compute_resources/{{ cr_id }}-{{ cpr.compute_attributes[0].compute_resource }}/refresh_cache"
method: "PUT"
body_format: "json"
user: "{{ satellite_admin_username }}"
password: "{{ satellite_admin_password }}"
force_basic_auth: true
validate_certs: "{{ satellite_validate_certs }}"
register: refresh_result
- debug:
var: refresh_result.json.message
Wait, you create a fresh CR and after that the cache is invalid? That sounds like a Foreman bug, not something we should (have to) workaround in FAM.
Interestingly, https://github.com/theforeman/foreman/blob/develop/app/models/concerns/compute_resource_caching.rb only calls a refresh automatically after_update
, but not after_create
(or after_save
which would contain both).
The original refresh was added in https://projects.theforeman.org/issues/19506 / https://github.com/theforeman/foreman/pull/4524
Wonder what @ares thinks about this.
That patch was create to only solve the issue of updating the CR. I'm not sure how the cache could be invalid right after the CR creation, but if it helps, I think replacing after_update
with after_save
is a good move. There was no higher logic in why we don't do that after the CR creation, it just felt unnecessary.
Yeah, I am too curious how this ended up with a "bad" cache, but here we are.
@parmstro if the issue is sufficiently reproducible in your env, could you try patching it to use after_save
instead of after_update
and see if it makes anything better?
Yes. I will patch to use after_save and set caching_enabled to true for my next test run. Please Note: with only caching_enabled, and no code to call cache_refresh, the builder creates the CR and gets through a couple of CPs then emits the error. This is very reproducible. with caching_enabled, and the code to call cache_refresh (code queries the API to get the ID of the compute resources so that we can use it in the call), this errors right after the call that creates the CR.
Could you by any chance provide access to a reproducer system?
The systems are built and torn down constantly. I would have to spin you up one, but that can be done. Let me see if I can work on it. I am in the middle of a test right now with caching_enabled: false .. The environment build is past the compute_resource creation and is actively using it to build systems. I am switching it back for the next run and creating the edit that you requested in comment 5. Currently building tang hosts, so environment should be finished in about 90 minutes or so.