ibm_zos_core icon indicating copy to clipboard operation
ibm_zos_core copied to clipboard

[Bug] [zos_operator] "BGYSC0822E Console activation failed" when more than one host (on the same IP address) running in parallel

Open LuiggiTorricelli opened this issue 11 months ago • 9 comments

Is there an existing issue for this?

  • [X] There are no existing issues.

Bug description

Hello team, I tested the execution of the zos_operator module on more than one host with the same IP address at the same time and got the error below:

{"changed": false, "msg": "OperatorCmdError('D IPLINFO', 12, ['', 'Out: ', 'Err: BGYSC0804I Using timeout of 300 centiseconds.', 'BGYSC0801I CONSBUFPGNUM=128 - Console command output buffer memory size to be allocated: 131072 bytes.', 'BGYSC0806I Using console name of LW090000.', 'BGYSC0805I Using CART of ZOAU0380.', 'BGYSC0822E Console activation failed: Extended console activation (MCSOPER) failed RC=04,RSN=00', '', 'Ran: D IPLINFO'])"}

So, when running for 3 hosts NWRD_DTF2, NWRD_DB2R and NWRD_DB1U, one of them finishes successfully and the rest fails with the error above. Note that both hosts use the same IP address.

This is the failing step of the playbook:

    - ibm.ibm_zos_core.zos_operator:
        cmd: "D IPLINFO"
        verbose: true
        wait_time_s: 3
      register: output

On the z/OS syslog, I couldn't find any RACF error messages, and the user ID being used has access on the FACILITY MVS.MCSOPER.* profile.

I'm suspecting this might be related to this message "BGYSC0806I Using console name of LW090000". Seems like one of the calls allocated the console LW090000 and the other two try to allocate the same one but it fails because it is already allocated. Not sure about it, though.

If I'm not wrong, I didn't have issues with that +-1.5 / 2 years ago.

Could you help me to identify what is happening here?

Thanks!

IBM z/OS Ansible core Version

v1.8.0 (default)

IBM Z Open Automation Utilities

v1.2.5 (default)

IBM Enterprise Python

v3.11.x (default)

ansible-version

v2.15.x

z/OS version

v2.5 (default)

Ansible module

zos_operator

Playbook verbosity output.

output_operator.txt

Ansible configuration.

[defaults]
vault_password_file=/home/luiggi/ansible/.vault_password

Contents of the inventory

all:
  hosts:
    localhost:
  children:
    sandbox:
      children:
        AMXT:
          children:
            NWRD:
              hosts:
                NWRD_DB1U:
                NWRD_DB2R:
                NWRD_DTF2:

Contents of group_vars or host_vars

group_vars/all.yml
--------------------------------------
ansible_ssh_pipelining: false
ansible_user: LW0966A
jobcard:
  msgclass: R
  class: G
  account: "UU999999999,T895"
  notify: "@"
--------------------------------------



group_vars/AMXT.yml
--------------------------------------
PYZ: "/usr/lpp/IBM/cyp/v3r11/pyz"
ZOAU: "/usr/lpp/IBM/zoautil"
ansible_python_interpreter: "{{ PYZ }}/bin/python3"
environment_vars:
    ZOAU_HOME: "{{ ZOAU }}"
    #PYTHONPATH: "{{ ZOAU }}/lib"
    LIBPATH: "{{ ZOAU }}/lib:{{ PYZ }}/lib:/usr:/usr/lib:/lib"
    PATH: "{{ ZOAU }}/bin:{{ PYZ }}/bin:/bin:/usr/sbin:/usr/bin"
    _BPXK_AUTOCVT: "ON"
    _CEE_RUNOPTS: "FILETAG(AUTOCVT,AUTOTAG) POSIX(ON)"
    _TAG_REDIR_ERR: "txt"
    _TAG_REDIR_IN: "txt"
    _TAG_REDIR_OUT: "txt"
    LANG: "C"
--------------------------------------



group_vars/NWRD.yml
--------------------------------------
ansible_host: "xxxx.xxxx.xxxx.xxxx" <- omitting the address for security purposes
--------------------------------------



host_vars/NWRD_DB1U.yml
host_vars/NWRD_DB2R.yml
host_vars/NWRD_DTF2.yml
--------------------------------------
No relevant variables here. Only variables used for our internal processes.

LuiggiTorricelli avatar Mar 19 '24 14:03 LuiggiTorricelli

Also hit this today. Solution seems to be adding a console option to the zos_operator module that would allow a user to override the default name of the generated EMCS console. Currently as no param is being passed the name is always the first 4 digits of the user with 4 padded zeros on the end e.g. HUGH0000 in my case.

A console param could then get passed to the console option on the python opercmd api meaning you could alter the console being created by opercmd in parallel running tasks. Did a quick mock up of that in https://github.com/ansible-collections/ibm_zos_core/compare/dev...andrewhughes101:ibm_zos_core:opercmd-console to see if it fixed our issue, which it did

andrewhughes101 avatar Mar 20 '24 15:03 andrewhughes101

Need to research and potentially coordinate with zoau.
Can we get multiline responses? Can we re-use current operator/terminal id, like within a playbook? Can we check the id is available?

May need to create state with playbook id on either target or controller.

richp405 avatar Mar 21 '24 18:03 richp405

Ideally we would not want to not ask the user for a 4 digit suffix, it seems like something we could manage from within the module. Imagine this use case where multiple playbooks are running on the same managed node, each playbook would have be cognizant of the other playbooks console suffix, it would force a design where there values have to be known between playbooks.

The better way is to introduce an action plugin which can uniquely create a suffix by either using the ansilble-playbook PID and/or data time , store that as a fact so other tasks can read it should the task be a zos_operator task.

ddimatos avatar Apr 05 '24 05:04 ddimatos

We are trying to run multiple playbooks concurrently and experiencing the following errors -->

Err: BGYSC0822E Console activation failed: Extended console activation (MCSOPER) failed RC=04,RSN=00

We suspect that allowing a console name parameter to zos_operator command would resolve this issue as well

paruljain11 avatar Apr 30 '24 14:04 paruljain11

In the interim, the work around is to use throttle with the zos_operator task, e.g., throttle: 1

ddimatos avatar May 05 '24 03:05 ddimatos

as a workaround you can use throttle option like this - name: "{{nodeType }} STC running check using cmd" delegate_to: zceeZosLpar ibm.ibm_zos_core.zos_operator: cmd: "$D JQ''{{ lookup('vars', inventory_hostname).nodeName|upper }}''" register: bc2Result throttle: 1

then for this step, each node is done sequentially which avoids the console collision issue

EdwardMcCarthy avatar May 05 '24 21:05 EdwardMcCarthy

I don't think that throttle will help if there are multiple playbooks running in parallel with the same user on the same system.

roded avatar May 06 '24 06:05 roded

for my case - I am only running one playbook, so works fine, for multiple playbooks - with same uid to same zOS - throttle won't be applicable since that's only working within one playbook

EdwardMcCarthy avatar May 06 '24 07:05 EdwardMcCarthy

I went back and looked over the conversation and case we had with Edward, throttle works for them because the playbook is managing a single z/OS node with multiple (assuming) zos_operator commands, thus throttling to only 1 zos_opeator command at a time , prevents console collision. This could have also been achieved with forks reduced to 1, but has other performance implications if async is used.

In your case @roded if you don't mind a bit of a performance slowdown, I believe you could try using serial and forcing only one playbook to connect to the z/OS managed node at a time. See doc , controlling batch size here.

---
- name: all
  serial: 1
  gather_facts: False

This is being worked on this quarter but wanted to provide possibly a work around.

ddimatos avatar May 06 '24 19:05 ddimatos

With ZOAU < 1.3.2

Image

With ZOAU 1.3.2

Image

AndreMarcel99 avatar Jun 25 '24 17:06 AndreMarcel99