manageiq icon indicating copy to clipboard operation
manageiq copied to clipboard

Provisioning freezes during ansible playbook launch

Open kTipSSIoYv opened this issue 3 years ago • 25 comments

Provisioning freezes during ansible playbook (method) launch. It doesn't even launch/run the ansible-playbook command.

Any new provisioning requests also gets frozen. This issue exists on Morphy and Lasker.

image

image

[----] I, [2022-07-03T13:34:47.582215 #1382:2b07b5895958]  INFO -- automation: Q-task_id([r725_miq_provision_1900]) Invoking [playbook] method [/ManageIQ_Custom/Cloud/VM/Provisioning/StateMachines/Methods/testing_vm] with inputs [{"vm_id"=>"nil"}]
[----] I, [2022-07-03T13:34:48.736359 #1382:2b07b5895958]  INFO -- automation: Q-task_id([r725_miq_provision_1900]) Playbook Method passing options:
[----] I, [2022-07-03T13:34:48.736915 #1382:2b07b5895958]  INFO -- automation: Q-task_id([r725_miq_provision_1900])
---
:become_enabled: false
:execution_ttl: '3'
:extra_vars:
  vm_id: nil
  :manageiq:
    automate_workspace: automate_workspaces/19d36f9e-4722-481c-ac2d-f5a12478cfad
    api_url: https://172.17.0.2
    api_token: f5992ebdecd767efd2fcae46e753edb7
    user: users/1
    group: groups/2
    X_MIQ_Group: EvmGroup-super_administrator
    request_task: requests/725/request_tasks/1900
    request: requests/725
  :manageiq_connection:
    url: https://172.17.0.2
    token: f5992ebdecd767efd2fcae46e753edb7
    X_MIQ_Group: EvmGroup-super_administrator
:hosts:
- 10.10.10.10
:verbosity: '3'
:credential: 96
[----] I, [2022-07-03T13:34:49.434445 #1382:2b07b5895958]  INFO -- automation: Q-task_id([r725_miq_provision_1900]) Ansible job with ref 42342 was created.
[----] I, [2022-07-03T13:34:50.512541 #297:2b1baf6ad95c]  INFO -- automation: User [admin] with current group ID [1] name [Tenant My Company/1 access]
[----] I, [2022-07-03T13:34:50.515707 #297:2b1baf6ad95c]  INFO -- automation: Instantiating [/System/Process/Event?EventStream%3A%3Aevent_stream=19884&MiqServer%3A%3Amiq_server=4&User%3A%3Auser=1&event_id=19884&event_stream_id=19884&event_type=AWS_EC2_Subnet_UPDATE&object_name=Event&vmdb_object_type=event_stream]
[----] I, [2022-07-03T13:34:50.528811 #297:2b1baf6ad95c]  INFO -- automation: Updated namespace [/System/Process/Event?EventStream%3A%3Aevent_stream=19884&MiqServer%3A%3Amiq_server=4&User%3A%3Auser=1&event_id=19884&event_stream_id=19884&event_type=AWS_EC2_Subnet_UPDATE&object_name=Event&vmdb_object_type=event_stream  ManageIQ/System]
[----] I, [2022-07-03T13:34:50.549414 #297:2b1baf6ad95c]  INFO -- automation: Following Relationship [miqaedb:/System/Event/EmsEvent/AMAZON/AWS_EC2_Subnet_UPDATE#create]
[----] I, [2022-07-03T13:34:50.554425 #297:2b1baf6ad95c]  INFO -- automation: Updated namespace [miqaedb:/System/Event/EmsEvent/AMAZON/AWS_EC2_Subnet_UPDATE#create  ManageIQ/System/Event/EmsEvent]
[----] I, [2022-07-03T13:34:50.563418 #295:2ad159e6f974]  INFO -- automation: User [admin] with current group ID [1] name [Tenant My Company/1 access]
[----] I, [2022-07-03T13:34:50.564429 #297:2b1baf6ad95c]  INFO -- automation: Following Relationship [miqaedb:/System/event_handlers/refresh#create]
[----] I, [2022-07-03T13:34:50.565566 #295:2ad159e6f974]  INFO -- automation: Instantiating [/System/Process/Event?EventStream%3A%3Aevent_stream=19885&MiqServer%3A%3Amiq_server=4&User%3A%3Auser=1&event_id=19885&event_stream_id=19885&event_type=AWS_EC2_NetworkInterface_CREATE&object_name=Event&vmdb_object_type=event_stream]
[----] I, [2022-07-03T13:34:50.569891 #297:2b1baf6ad95c]  INFO -- automation: Updated namespace [miqaedb:/System/event_handlers/refresh#create  ManageIQ/System]
[----] I, [2022-07-03T13:34:50.575273 #295:2ad159e6f974]  INFO -- automation: Updated namespace [/System/Process/Event?EventStream%3A%3Aevent_stream=19885&MiqServer%3A%3Amiq_server=4&User%3A%3Auser=1&event_id=19885&event_stream_id=19885&event_type=AWS_EC2_NetworkInterface_CREATE&object_name=Event&vmdb_object_type=event_stream  ManageIQ/System]

Here is another example of provisioning freezing. It works fine when its just a single provisioning but when multiple VM counts are set, it freezes after the playbook options.

[----] I, [2022-07-12T03:46:20.248847 #307:2ad5997a5974]  INFO -- automation: Q-task_id([r821_miq_provision_2087]) Playbook Method passing options:
[----] I, [2022-07-12T03:46:20.249442 #307:2ad5997a5974]  INFO -- automation: Q-task_id([r821_miq_provision_2087])
---
:become_enabled: false
:execution_ttl: '3'
:extra_vars:
  vm_id: i-03bd18e4d331734b8
  :manageiq:
    automate_workspace: automate_workspaces/c866a302-ff83-420e-907f-371273f157c3
    api_url: https://172.17.0.2
    api_token: 88ad01ad237a6d0ac4c27f32098ea2fb
    user: users/1
    group: groups/2
    X_MIQ_Group: EvmGroup-super_administrator
    request_task: requests/821/request_tasks/2087
    request: requests/821
  :manageiq_connection:
    url: https://172.17.0.2
    token: 88ad01ad237a6d0ac4c27f32098ea2fb
    X_MIQ_Group: EvmGroup-super_administrator
:hosts:
- localhost
:verbosity: '3'
:credential: 96
[----] I, [2022-07-12T03:46:20.252724 #309:2abe5625b964]  INFO -- automation: Q-task_id([r821_miq_provision_2088]) Updated namespace [Cloud/VM/Provisioning/StateMachines/Methods/testing_vm  ManageIQ_Custom/Cloud/VM/Provisioning/StateMachines]
[----] I, [2022-07-12T03:46:20.261804 #309:2abe5625b964]  INFO -- automation: Q-task_id([r821_miq_provision_2088]) Invoking [playbook] method [/ManageIQ_Custom/Cloud/VM/Provisioning/StateMachines/Methods/testing_vm] with inputs [{"vm_id"=>"i-092a45b3bddd9a0f2"}]
[----] I, [2022-07-12T03:46:20.525966 #309:2abe5625b964]  INFO -- automation: Q-task_id([r821_miq_provision_2088]) Playbook Method passing options:
[----] I, [2022-07-12T03:46:20.526465 #309:2abe5625b964]  INFO -- automation: Q-task_id([r821_miq_provision_2088])
---
:become_enabled: false
:execution_ttl: '3'
:extra_vars:
  vm_id: i-092a45b3bddd9a0f2
  :manageiq:
    automate_workspace: automate_workspaces/546e38fc-bd83-41eb-8c8c-a701965eae20
    api_url: https://172.17.0.2
    api_token: c97d29074a3baac85a7aa2000ab84c8a
    user: users/1
    group: groups/2
    X_MIQ_Group: EvmGroup-super_administrator
    request_task: requests/821/request_tasks/2088
    request: requests/821
  :manageiq_connection:
    url: https://172.17.0.2
    token: c97d29074a3baac85a7aa2000ab84c8a
    X_MIQ_Group: EvmGroup-super_administrator
:hosts:
- localhost
:verbosity: '3'
:credential: 96
[----] I, [2022-07-12T03:46:20.581454 #307:2ad5997a5974]  INFO -- automation: Q-task_id([r821_miq_provision_2087]) Ansible job with ref 44048 was created.
[----] I, [2022-07-12T03:46:21.172005 #309:2abe5625b964]  INFO -- automation: Q-task_id([r821_miq_provision_2088]) Ansible job with ref 44049 was created.

If I restart the Docker instance, the rest of the provisioning continues.

kTipSSIoYv avatar Jul 03 '22 15:07 kTipSSIoYv

@kbrock Can you take a look at this one?

Fryguy avatar Jul 06 '22 17:07 Fryguy

@Fryguy @kbrock This issue still persist and it happens when you provisioning more than 1 VM. Playbook executes and provisioning continues when provisioning only 1 VM but we try 2 or 3 VM then the provisioning gets stuck.

I have to restart docker container in order for other requests to continue.

kTipSSIoYv avatar Jul 20 '22 10:07 kTipSSIoYv

This is still an issue. Sometimes ansible job launches and sometimes it doesn't and other jobs ends up as either stuck in active state or ends up pending even if approved.

image

irb(main):001:0> pp ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job.find_by(:ems_ref => "66694")
PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0
#<ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job:0x000055fe967719e8
 id: 559,
 name: "azure_sg_create.yaml",
 type: "ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job",
 description: nil,
 status: "Pre_execute",
 ems_ref: "66694",
 ancestry: nil,
 ems_id: 1,
 orchestration_template_id: nil,
 created_at: Mon, 08 Aug 2022 15:03:37 UTC +00:00,
 updated_at: Mon, 08 Aug 2022 15:03:37 UTC +00:00,
 retired: nil,
 retires_on: nil,
 retirement_warn: nil,
 retirement_last_warn: nil,
 retirement_state: nil,
 retirement_requester: nil,
 status_reason: nil,
 cloud_tenant_id: nil,
 resource_group: nil,
 start_time: nil,
 finish_time: nil,
 configuration_script_base_id: 126,
 verbosity: 3,
 hosts: nil,
 evm_owner_id: nil,
 miq_group_id: nil,
 tenant_id: nil>
=> #<ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job id: 559, name: "azure_sg_create.yaml", type: "ManageIQ::Providers::EmbeddedAnsible::AutomationMa...", description: nil, status: "Pre_execute", ems_ref: "66694", ancestry: nil, ems_id: 1, orchestration_template_id: nil, created_at: "2022-08-08 15:03:37", updated_at: "2022-08-08 15:03:37", retired: nil, retires_on: nil, retirement_warn: nil, retirement_last_warn: nil, retirement_state: nil, retirement_requester: nil, status_reason: nil, cloud_tenant_id: nil, resource_group: nil, start_time: nil, finish_time: nil, configuration_script_base_id: 126, verbosity: 3, hosts: nil, evm_owner_id: nil, miq_group_id: nil, tenant_id: nil>

kTipSSIoYv avatar Aug 08 '22 18:08 kTipSSIoYv

@kTipSSIoYv what do you see for the playbook stdout?

pp ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job.find_by(:ems_ref => "66694").raw_stdout_json

agrare avatar Aug 08 '22 18:08 agrare

@agrare Is there a typo in the command? I see a syntax error.

SyntaxError ((irb):15: syntax error, unexpected local variable or method, expecting `end')
..._ref => "66694")raw_stdout_json
...                ^~~~~~~~~~~~~~~

kTipSSIoYv avatar Aug 08 '22 18:08 kTipSSIoYv

Updated, you need the .

agrare avatar Aug 08 '22 18:08 agrare

irb(main):016:0> pp ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job.find_by(:ems_ref => "66694").raw_stdout_json
Traceback (most recent call last):
       15: from bin/rails:4:in `<main>'
       14: from bootsnap (1.11.1) lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:30:in `require'
       13: from bootsnap (1.11.1) lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:30:in `require'
       12: from railties (6.0.5.1) lib/rails/commands.rb:18:in `<main>'
       11: from railties (6.0.5.1) lib/rails/command.rb:46:in `invoke'
       10: from railties (6.0.5.1) lib/rails/command/base.rb:69:in `perform'
        9: from thor (1.2.1) lib/thor.rb:392:in `dispatch'
        8: from thor (1.2.1) lib/thor/invocation.rb:127:in `invoke_command'
        7: from thor (1.2.1) lib/thor/command.rb:27:in `run'
        6: from railties (6.0.5.1) lib/rails/commands/console/console_command.rb:102:in `perform'
        5: from railties (6.0.5.1) lib/rails/commands/console/console_command.rb:19:in `start'
        4: from railties (6.0.5.1) lib/rails/commands/console/console_command.rb:70:in `start'
        3: from (irb):15
        2: from (irb):16:in `rescue in irb_binding'
        1: from activemodel (6.0.5.1) lib/active_model/attribute_methods.rb:429:in `method_missing'
NoMethodError (private method `raw_stdout_json' called for #<ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job:0x000055fe95d4a050>)
Did you mean?  raw_stdout


irb(main):017:0> pp ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job.find_by(:ems_ref => "66694").raw_stdout
""
=> ""

kTipSSIoYv avatar Aug 08 '22 18:08 kTipSSIoYv

Okay someone who actually knows embedded Ansible is going to have to help you sorry @kTipSSIoYv

agrare avatar Aug 08 '22 19:08 agrare

@kbrock @agrare Now the other jobs are running.

image

kTipSSIoYv avatar Aug 08 '22 19:08 kTipSSIoYv

What did the stdout show? What changed to make it work?

jrafanie avatar Aug 08 '22 19:08 jrafanie

@kTipSSIoYv do you mean they're working, or the older ones timed out and now the latest is hung

agrare avatar Aug 08 '22 19:08 agrare

@jrafanie @agrare after running the pp ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job.find_by(:ems_ref => "66694").raw_stdout command other stuck jobs started to work. It was stuck until then. Note: Playbook fails right now without the variables and that's why the status is error.

kTipSSIoYv avatar Aug 08 '22 19:08 kTipSSIoYv

@kTipSSIoYv is that reproducible?

agrare avatar Aug 09 '22 14:08 agrare

@agrare Yea its reproducible. Usually when you launch multiple count of servers using a single catalog it gets stuck during ansible playbook launch.

kTipSSIoYv avatar Aug 11 '22 02:08 kTipSSIoYv

@Fryguy In the latest version of Narjdorf Ansible job isn't launching at all. I see the following log entry but ansible runner isn't launching at all.

[----] I, [2022-09-05T16:20:05.401602 #324:8cdc] INFO -- automation: Q-task_id([r1169_miq_provision_2912]) Ansible job with ref 92663 was created.

kTipSSIoYv avatar Sep 05 '22 16:09 kTipSSIoYv

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

miq-bot avatar Feb 27 '23 00:02 miq-bot

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

miq-bot avatar Jun 12 '23 00:06 miq-bot

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

miq-bot avatar Oct 16 '23 00:10 miq-bot

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

miq-bot avatar Jan 22 '24 00:01 miq-bot