hanging process with 100% CPU since 0.3.28
Using Mitogen 0.3.29 we observed some machines having 100% CPU usage on one core. It is a python process of Mitogen parent that hangs. When our pipelines run Ansible multiple times there may exist multiple instances of such a process increasing CPU load even more.
After some debugging we found that the loop of this code is running - never ending.
while PREAMBLE_COMPRESSED_LEN-len(C)and select.select([0],[],[]):C+=os.read(0,PREAMBLE_COMPRESSED_LEN-len(C))
A strace showed the process is calling this over and over again:
read(0, "", 18277) = 0
pselect6(1, [0], NULL, NULL, NULL, NULL) = 1 (in [0])
read(0, "", 18277) = 0
pselect6(1, [0], NULL, NULL, NULL, NULL) = 1 (in [0])
This was not happening on all machines nor for each run of our Ansible pipeline.
Workaround: After reverting back to Mitogen 0.3.27 no more problems where seen.
I think this is related to the changes in #1307. Maybe because we don't use sudo logging?!
Setup:
We use SSH with an unprivileged user and use passwordless sudo to become root. We execute Ansible in a docker container running an execution environment. So each run uses exactly the same environment.
requirements.txt:
ansible-lint==25.9.2
ansible-navigator==25.9.0
jmespath==1.0.1
lxml==6.0.2
passlib==1.7.4
sarif-tools==3.0.5
mitogen==0.3.29
For completeness, what is the OS and Python version on the affected machines?
Notes to self
pselect6(1, [0], NULL, NULL, NULL, NULL) = 1 (in [0])
pselect6() asked if a single fd (fd=0) was ready to read (such that read() would not block), return values indicate that single fd=0 is ready
read(0, "", 18277) = 0
Attempted to read upto 18277 bytes, returned (i.e. didn't block) with 0 bytes.
Machines are running Ubuntu 24.04 LTS and Debian 12. Python is Python 3.12.3 or Python 3.11.2. .
Brainstorming possibilities (with input from claude.ai), in roughly decending order of likelyhood by my gut feeling
- EOF - remote end of pipe/socket that feeds fd=0 closed, or was reset
- Race condition - something else read the data, between
select()returning and callingread()- another part of Mitogen
- something in Ansible
- something in PAM, ssh, or sudo (e.g. a plugin)
- wildcard - e.g. third-party audit, security tool, agent, etc.
select()false positive (I'm skeptical)
- EOF - remote end of pipe/socket that feeds fd=0 closed, or was reset
If this is the case, does the parent notice? Does it do anything about it? Can it?
@DanielRaapDev
- When you observed processes at 100% CPU were any failures or retries reported on the controller?
- Do you have any logs or stdout from such an incident?
- As a rough estimate how often did you observe it? E.g. 50% of playbook executions, once per 1000 executions
- Do you have anything out of the ordinary in your authentication/authorization stack? E.g. single sign on, policy agents, endpoint managment, custom SSH config, sudo plugins, PAM plugins
Rather than diagnose/fix the exact cause, fixing the symptom may be more robust. E.g. have the first stage self-destruct if it hasn't reached the second stage within N seconds.
- Good that you ask. I looked in the old logs and found some error:
PLAY [name of our play] *****************
[ERROR]: [mux 38] 09:13:08.341470 E mitogen.[ssh.billing-test.buero.subshell.io]: while importing 'ansible.module_utils.json_utils'
Traceback (most recent call last):
File "<stdin>", line 1674, in exec_module
File "master:/usr/local/lib/python3.14/site-packages/ansible/module_utils/json_utils.py", line 27
SyntaxError: future feature annotations is not defined
But the increased CPU usage was at a later run when no such error was in the log 🤔
-
No uncommon output here. Our check runs are very minimal due symmary output only. See ansible.cfg below.
-
About 50-70% of machines were affected. But only about maybe 5-10% of the Jobs left such a process behind. So there must be a race condition be involved.
-
No fancy auth here, just SSH pub key with local key files. See ansible.cfg above for custom settings.
[defaults]
interpreter_python=auto_silent
inventory=inventory/hosts.yml
remote_user=sa_ansible
callbacks_enabled = ansible.posix.profile_tasks,ansible.posix.profile_roles
forks=32
# disable HostKeyChecking so Mitogen works on Jenkins Hosts
host_key_checking=False
gathering=smart
[privilege_escalation]
# Default to sudo:
become=True
[ssh_connection]
# Keep SSH parameters in place, so Ansible execution without Mitogen will use better SSH connection
ssh_args = -o ControlMaster=auto -o ControlPersist=600 -o PreferredAuthentications=publickey
# keeping this empty prevents too long unix_socket path error
control_path =
pipelining = True
[callback_profile_roles]
summary_only = True
[callback_profile_tasks]
summary_only = True
Btw. most Jobs only run in check mode.
Rather than diagnose/fix the exact cause, fixing the symptom may be more robust. E.g. have the first stage self-destruct if it hasn't reached the second stage within N seconds.
Yeah, the synchronous call prior to #1307 did not had this issue. So something with the new way of reading the data fails where the previous detected that case! or it never happend?
Thanks for your work :)
Yeah, the synchronous call prior to #1307 did not had this issue. So something with the new way of reading the data fails where the previous detected that case! or it never happend?
Previously it was using a single shot fp.read(N). So if the input was truncated then zlib.decompress() would have most likely have thrown an unhandled exception, thus the process(es) would have exited with a non-zerro status.
Please could you try https://github.com/mitogen-hq/mitogen/commit/9cc0ab0823b0280f5309d0f102a42f2cceb99b57, from #1349
I am able to somewhat reproduce this by aborting ansible-playbook runs (against ~200 hosts) with Ctrl-C.
There seem to be 2 types of "stuck" processes:
&yellow: idle processes&red: busy-looping processes
Due to this issue I have some monitoring data to share, which shows some of the problematic processes:
stuck process examples
color cputime runtime pid user cmd
&red 0.1h 0.1h 279437 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
<h2>ansible</h2><p>&yellow 0.0h 162.3h 732268 732262 root /usr/bin/python3( mitogen:user@host:3700546)
&red 153.1h 162.3h 732269 732268 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red 0.3h 0.3h 4104182 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
<h2>stuck processes</h2><p>&red 150.6302777777778h >10 732269 732268 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&red 0.2h 0.2h 35960 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
<h2>ansible</h2><p>&yellow 0.0h >10 732268 732262 root /usr/bin/python3( mitogen:user@host:3700546)
&red 152.79694444444445h >10 732269 732268 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 90.6h 348812 root /usr/bin/python3( mitogen:user@host:712040)
&red 78.3h 90.6h 348813 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red 0.1h 0.1h 663867 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&red 0.0h 0.1h 2463164 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 90.6h 2248518 root /usr/bin/python3( mitogen:user@host:711582)
&red 78.4h 90.6h 2248519 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red 0.1h 0.1h 876922 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&yellow 0.0h 1022.4h 3222703 root /usr/bin/python3( mitogen:user@host:2542363)
&red 1011.7h 1022.4h 3222704 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 162.6h 3688708 root /usr/bin/python3( mitogen:user@host:3700546)
&red 153.4h 162.6h 3688709 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 666.7h 2786038 root /usr/bin/python3( mitogen:user@host:756221)
&red 651.3h 666.7h 2786039 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&red 0.1h 0.1h 469758 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&yellow 0.0h 666.7h 2038373 root /usr/bin/python3( mitogen:user@host:757665)
&red 652.1h 666.7h 2038374 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 90.6h 1572053 root /usr/bin/python3( mitogen:user@host:711582)
&red 78.3h 90.6h 1572055 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 666.7h 830797 root /usr/bin/python3( mitogen:user@host:756221)
&red 652.2h 666.7h 830798 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&red 0.2h 0.3h 3344750 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 38.0h 3374195 root /usr/bin/python3( mitogen:user@host:3092216)
&red 18.8h 38.0h 3374196 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 38.5h 3374195 root /usr/bin/python3( mitogen:user@host:3092216)
&red 19.3h 38.5h 3374196 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 38.8h 2528754 root /usr/bin/python3( mitogen:user@host:3092216)
&red 18.8h 38.8h 2528755 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 1.7h 1100427 root /usr/bin/python3( mitogen:user@host:668520)
&yellow 0.0h 1.7h 1100432 root /usr/bin/python3( mitogen:user@host:668520)
&red 0.1h 0.1h 1114730 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red 0.0h 0.0h 3374196 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 666.7h 1826395 root /usr/bin/python3( mitogen:user@host:757665)
&red 632.1h 666.7h 1826396 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red 0.2h 0.2h 1131234 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&yellow 0.0h 858.6h 3021055 root /usr/bin/python3( mitogen:user@host:57150)
&red 845.8h 858.6h 3021056 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 1.2h 1100427 root /usr/bin/python3( mitogen:user@host:668520)
&yellow 0.0h 1.2h 1100432 root /usr/bin/python3( mitogen:user@host:668520)
&yellow 0.0h 666.7h 3954410 root /usr/bin/python3( mitogen:user@host:756221)
&red 649.3h 666.7h 3954411 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 162.6h 1483873 root /usr/bin/python3( mitogen:user@host:3700546)
&red 153.4h 162.6h 1483874 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&yellow 0.0h 714.6h 2180811 root /usr/bin/python3( mitogen:user@host:7664)
&red 707.0h 714.6h 2180812 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 38.4h 698811 root /usr/bin/python3( mitogen:user@host:3092216)
&red 19.3h 38.4h 698812 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 782.8h 1569058 root /usr/bin/python3( mitogen:user@host:756221)
&red 769.2h 782.8h 1569059 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 714.6h 1081190 root /usr/bin/python3( mitogen:user@host:7664)
&red 706.9h 714.6h 1081191 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&yellow 0.0h 666.7h 1012717 root /usr/bin/python3( mitogen:user@host:756221)
&red 644.1h 666.7h 1012718 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&yellow 0.0h 162.6h 3905890 root /usr/bin/python3( mitogen:user@host:3700546)
&red 153.4h 162.6h 3905891 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 714.6h 805821 root /usr/bin/python3( mitogen:user@host:7664)
&red 706.7h 714.6h 805822 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red 0.2h 0.2h 987350 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red 0.2h 0.2h 1409297 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 90.6h 1874827 root /usr/bin/python3( mitogen:user@host:711582)
&red 78.4h 90.6h 1874828 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red 0.1h 0.1h 4168476 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red 0.0h 0.1h 1463751 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red 0.3h 0.3h 2978525 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow 0.0h 162.6h 969013 root /usr/bin/python3( mitogen:user@host:3700546)
&red 153.4h 162.6h 969014 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&yellow 0.0h 666.7h 291047 root /usr/bin/python3( mitogen:user@host:756221)
&red 646.2h 666.7h 291048 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
Usually when Ctrl-C-ing it immediately aborts with ^C[ERROR]: User interrupted execution. But sometimes mitogen throws red walls of errors or exceptions and I had to hit Ctrl-C multiple times. Rarely I had to kill the ansible-playbook processes because they never aborted even after many Ctrl-Cs.
I am now testing your patch and will report back later.
&yellow 0.0h 1.2h 1100427 root /usr/bin/python3( mitogen:user@host:668520)
All the yellow (ide processes) are fork parents with argv /path/to/python(<mitogen parent>) - they've exec()d a fresh Python, and it is waiting for code to execute on it's stdin, which is connected to a pipe shared with with its fork child.
&red 1011.7h 1022.4h 3222704 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
All the red processes (busy loop, presumed read/select) are fork children. These will be subject to the new timeout.
If the timeout fires it should terminate the fork child (red), closing the pipe file descriptors it holds, in turn causing the fork parent's stdin to close, in turn causing that Python to exit.
I was not able to reproduce anymore busy looping (red) processes using 9cc0ab0, from https://github.com/mitogen-hq/mitogen/pull/1349 cherry picked on top of most recent master fdb5c625.
But there still were some leftover idle processes on 19 remote hosts, which were all connected to processes on my control node host (0 09:52 2227217 2211716 user ssh -o LogLevel ERROR -l root -o Compression yes -o ServerAliveInterval 30 -o ServerAliveCountMax 10 -o BatchMode yes -o StrictHostKeyChecking yes -o ControlMaster=auto -o PreferredAuthentications=publickey -o ServerAliveInterval=30 -F .ssh_config remote-host /usr/bin/python3 -c 'import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,signal,zlib;exec(zlib.decompress(binascii.a2b_base64("xxx")))):
Leftover idle processes
Output of ps -eo cputimes,stime,pid,ppid,user:12,args:
0 09:48 1478214 1478210 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1478217 1478214 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1478228 1478224 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1478231 1478228 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1476516 1476512 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1476519 1476516 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1481142 1481138 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1481145 1481142 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1255576 1255572 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1255579 1255576 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1261001 1260997 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1261004 1261001 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1147805 1147801 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1147808 1147805 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1260701 1260697 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1260704 1260701 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1261065 1261061 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1261068 1261065 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1260123 1260119 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1260126 1260123 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1260560 1260556 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1260563 1260560 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1259975 1259971 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1259978 1259975 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1261033 1261029 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1261036 1261033 root /usr/bin/python3(mitogen:user@host:2203330)
0 09:50 1664891 1664887 root /usr/bin/python3(mitogen:user@host:2206537)
0 09:50 1664894 1664891 root /usr/bin/python3(mitogen:user@host:2206537)
0 09:50 1983643 1983639 root /usr/bin/python3(mitogen:user@host:2206537)
0 09:50 1983646 1983643 root /usr/bin/python3(mitogen:user@host:2206537)
0 09:50 1302747 1302743 root /usr/bin/python3(mitogen:user@host:2206537)
0 09:50 1302750 1302747 root /usr/bin/python3(mitogen:user@host:2206537)
0 09:50 2759687 2759683 root /usr/bin/python3(mitogen:user@host:2206537)
0 09:50 2759690 2759687 root /usr/bin/python3(mitogen:user@host:2206537)
The remote processes terminated after I killed the processes on my control node host.
Some errors encountered during Ctrl-C abort testing
^CException ignored in: <function _after_fork at 0x7f33765f3600>
Traceback (most recent call last):
File "/usr/lib/python3.11/threading.py", line 1638, in _after_fork
[ERROR]: User interrupted execution
Process WorkerProcess-134:
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/process/worker.py", line 159, in _detach
os.setsid()
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
if worker is None or not worker.is_alive():
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 176, in wrap_worker__run
return mitogen.core._profile_hook('WorkerProcess',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/core.py", line 676, in _profile_hook
return func(*args)
^^^^^^^^^^^
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 177, in <lambda>
lambda: worker__run(self)
^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/process/worker.py", line 192, in run
self._detach()
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/process/worker.py", line 177, in _detach
display.error(f'Could not detach from stdio: {e}')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
if worker is None or not worker.is_alive():
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
TASK [Gathering Facts] ************************************************************************************************
^CException ignored in: <function _releaseLock at 0x7f7ac756dee0>
Traceback (most recent call last):
File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
def _releaseLock():
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt:
[ERROR]: [mux 2168469] 09:25:53.157721 E mitogen.unix: listener: failed to assign identity to PID 2168543: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.162541 E mitogen.unix: listener: failed to assign identity to PID 2168544: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.178098 E mitogen.unix: listener: failed to assign identity to PID 2168548: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.178351 E mitogen.unix: listener: failed to assign identity to PID 2168551: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.178432 E mitogen.unix: listener: failed to assign identity to PID 2168556: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.178493 E mitogen.unix: listener: failed to assign identity to PID 2168557: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.178551 E mitogen.unix: listener: failed to assign identity to PID 2168560: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.178612 E mitogen.unix: listener: failed to assign identity to PID 2168563: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.178694 E mitogen.unix: listener: failed to assign identity to PID 2168566: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.178746 E mitogen.unix: listener: failed to assign identity to PID 2168569: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.178792 E mitogen.unix: listener: failed to assign identity to PID 2168574: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.178840 E mitogen.unix: listener: failed to assign identity to PID 2168575: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.178885 E mitogen.unix: listener: failed to assign identity to PID 2168578: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.178931 E mitogen.unix: listener: failed to assign identity to PID 2168581: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.178975 E mitogen.unix: listener: failed to assign identity to PID 2168584: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.179019 E mitogen.unix: listener: failed to assign identity to PID 2168587: [Errno 32] Broken pipe
[ERROR]: [mux 2168469] 09:25:53.179061 E mitogen.unix: listener: failed to assign identity to PID 2168590: [Errno 32] Broken pipe
[ERROR]: Task failed: Connection timed out.
fatal: [space-pc102]: UNREACHABLE! =>
changed: false
msg: 'Task failed: Connection timed out.'
unreachable: true
ok: [guenther41]
^C[ERROR]: User interrupted execution
[ERROR]: [mux 2167350] 09:25:46.890245 E mitogen: Broker(d290): pending work still existed 5 seconds after shutdown began. This may be due to a timer that is yet to expire, or a child connection that did not fully shut down.
TASK [Gathering Facts] ************************************************************************************************
[ERROR]: [mux 2171668] 09:36:07.256212 E mitogen: Broker(fed0): pending work still existed 5 seconds after shutdown began. This may be due to a timer that is yet to expire, or a child connection that did not fully shut down.
^C[ERROR]: User interrupted execution
TASK [Gathering Facts] ************************************************************************************************
^C[ERROR]: User interrupted execution
Process WorkerProcess-25:
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 175, in wrap_worker__run
ansible_mitogen.affinity.policy.assign_worker()
File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 251, in assign_worker
self._balance('WorkerProcess')
File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 230, in _balance
self._set_cpu(descr, self._reserve_shift + (
File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 235, in _set_cpu
self._set_affinity(descr, 1 << (cpu % self.cpu_count))
File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 220, in _set_affinity
self._set_cpu_mask(mask)
File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 281, in _set_cpu_mask
_sched_setaffinity(tid, len(s), s)
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
if worker is None or not worker.is_alive():
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 317, in _bootstrap
util._exit_function()
File "/usr/lib/python3.11/multiprocessing/util.py", line 320, in _exit_function
def _exit_function(info=info, debug=debug, _run_finalizers=_run_finalizers,
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
if worker is None or not worker.is_alive():
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
^C
^C[ERROR]: Unexpected Exception, this is probably a bug: process object is closed
Unexpected Exception, this is probably a bug.
<<< caused by >>>
process object is closed
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 386, in run
play_return = strategy.run(iterator, play_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 348, in run
return mitogen.core._profile_hook('Strategy',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/core.py", line 676, in _profile_hook
return func(*args)
^^^^^^^^^^^
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 349, in <lambda>
lambda: run(iterator, play_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/linear.py", line 195, in run
self._queue_task(host, task, task_vars, play_context)
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 319, in _queue_task
return super(StrategyMixin, self)._queue_task(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/__init__.py", line 376, in _queue_task
worker_prc = WorkerProcess(
^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/process/worker.py", line 101, in __init__
self.worker_queue = WorkerQueue(ctx=multiprocessing_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/queues.py", line 43, in __init__
self._rlock = ctx.Lock()
^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/context.py", line 68, in Lock
return Lock(ctx=self.get_context())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 162, in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 58, in __init__
kind, value, maxvalue, self._make_name(),
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 117, in _make_name
next(SemLock._rand))
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/tempfile.py", line 293, in __next__
return ''.join(self.rng.choices(self.characters, k=8))
^^^^^^^^
File "/usr/lib/python3.11/tempfile.py", line 281, in rng
@property
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
if worker is None or not worker.is_alive():
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
self._check_closed()
File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
raise ValueError("process object is closed")
ValueError: process object is closed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 188, in run
result = self._tqm.run(play=play)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 389, in run
self._cleanup_processes()
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
if worker_prc and worker_prc.is_alive():
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
self._check_closed()
File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
raise ValueError("process object is closed")
ValueError: process object is closed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 660, in cli_executor
exit_code = cli.run()
^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/playbook.py", line 153, in run
results = pbex.run()
^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 252, in run
self._tqm.cleanup()
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 404, in cleanup
self._cleanup_processes()
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
if worker_prc and worker_prc.is_alive():
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
self._check_closed()
File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
raise ValueError("process object is closed")
ValueError: process object is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 669, in cli_executor
raise AnsibleError("Unexpected Exception, this is probably a bug.") from ex
ansible.errors.AnsibleError: Unexpected Exception, this is probably a bug: process object is closed
^CException ignored in: <function _releaseLock at 0x7f528716dee0>
Traceback (most recent call last):
File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
def _releaseLock():
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt:
ok: [guenther10]
^C
-rda:~/git/ansible-repo[±]$ [ERROR]: [mux 2203330] 09:48:52.032133 E mitogen.service: Pool(8ed0, size=32, th='mitogen.Pool.8ed0.14'): while invoking 'propagate_paths_and_modules' of 'mitogen.service.PushFileService'
Traceback (most recent call last):
File "/home/user/.ansible/mitogen/mitogen/service.py", line 619, in _on_service_call
return invoker.invoke(method_name, kwargs, msg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/service.py", line 305, in invoke
response = self._invoke(method_name, kwargs, msg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/service.py", line 291, in _invoke
ret = method(**kwargs)
^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/service.py", line 758, in propagate_paths_and_modules
self.propagate_to(context, mitogen.core.to_text(path), overridden_source)
File "/home/user/.ansible/mitogen/mitogen/service.py", line 793, in propagate_to
self._forward(context, path)
File "/home/user/.ansible/mitogen/mitogen/service.py", line 712, in _forward
child = self.router.context_by_id(stream.protocol.remote_id)
^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'protocol'
^C
^C[ERROR]: Task failed: Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected
Task failed.
Origin: /home/user/git/ansible-repo/roles-shared/blockdev/tasks/lvm.yml:50:3
48 label: "vg: {{ lv.0.name }}, {{ lv.1 }}"
49
50 - name: deploy lv filesystems
^ column 3
<<< caused by >>>
Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected
failed: [funkadelic] (item=vg: vg0, {'key': 'tmp', 'value': {'mount': '/tmp', 'size': '10G', 'mode': '1777'}}) =>
ansible_loop_var: lv
changed: false
lv:
- lvs:
- key: root
value:
mount: /
mount_options: noatime,nodiratime,errors=remount-ro
mount_pass: '1'
size: 10G
- key: log
value:
mount: /var/log
size: 2G
- key: tmp
value:
mode: '1777'
mount: /tmp
size: 10G
- key: swap
value:
fs: swap
mount: none
mount_options: defaults
mount_pass: '0'
size: 4G
- key: scr
value:
fs_options: -m 0
group: 1893
mode: '1770'
mount: /scratch
owner: 1893
size: 30G
- key: home
value:
mount: /home
size: 30G
name: vg0
pv_id: ata-APPLE_SSD_SM0256G_S2PANYAGB03426-part2
- key: tmp
value:
mode: '1777'
mount: /tmp
size: 10G
msg: 'Task failed: Mitogen was disconnected from the remote environment while a call
was in-progress. If you feel this is in error, please file a bug. Original error
was: the respondent Context has disconnected'
unreachable: true
[ERROR]: User interrupted execution
^CException ignored in: <function _releaseLock at 0x7f5250f65ee0>
Traceback (most recent call last):
File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
def _releaseLock():
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt:
[ERROR]: Task failed: Channel was disconnected while connection attempt was in progress; this may be caused by an abnormal Ansible exit, or due to an unreliable target.
fatal: [guenther55]: UNREACHABLE! =>
changed: false
msg: 'Task failed: Channel was disconnected while connection attempt was in progress;
this may be caused by an abnormal Ansible exit, or due to an unreliable target.'
unreachable: true
fatal: [guenther54]: UNREACHABLE! =>
changed: false
msg: 'Task failed: Channel was disconnected while connection attempt was in progress;
this may be caused by an abnormal Ansible exit, or due to an unreliable target.'
unreachable: true
[ERROR]: Task failed: Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected
Task failed.
<<< caused by >>>
Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected
fatal: [guenther51]: UNREACHABLE! =>
changed: false
msg: 'Task failed: Mitogen was disconnected from the remote environment while a call
was in-progress. If you feel this is in error, please file a bug. Original error
was: the respondent Context has disconnected'
unreachable: true
fatal: [guenther52]: UNREACHABLE! =>
changed: false
msg: 'Task failed: Mitogen was disconnected from the remote environment while a call
was in-progress. If you feel this is in error, please file a bug. Original error
was: the respondent Context has disconnected'
unreachable: true
fatal: [guenther49]: UNREACHABLE! =>
changed: false
msg: 'Task failed: Mitogen was disconnected from the remote environment while a call
was in-progress. If you feel this is in error, please file a bug. Original error
was: the respondent Context has disconnected'
unreachable: true
fatal: [guenther53]: UNREACHABLE! =>
changed: false
msg: 'Task failed: Channel was disconnected while connection attempt was in progress;
this may be caused by an abnormal Ansible exit, or due to an unreliable target.'
unreachable: true
fatal: [guenther50]: UNREACHABLE! =>
changed: false
msg: 'Task failed: Mitogen was disconnected from the remote environment while a call
was in-progress. If you feel this is in error, please file a bug. Original error
was: the respondent Context has disconnected'
unreachable: true
[ERROR]: Task failed: EOF on stream; last 100 lines received:
MITO000
fatal: [guenther56]: UNREACHABLE! =>
changed: false
msg: |-
Task failed: EOF on stream; last 100 lines received:
MITO000
unreachable: true
ok: [guenther58]
ok: [guenther59]
^C
^C[ERROR]: User interrupted execution
[ERROR]: [mux 2211812] 09:52:27.265308 E mitogen.service: Pool(92d0, size=32, th='mitogen.Pool.92d0.1'): while invoking 'propagate_paths_and_modules' of 'mitogen.service.PushFileService'
Traceback (most recent call last):
File "/home/user/.ansible/mitogen/mitogen/service.py", line 619, in _on_service_call
return invoker.invoke(method_name, kwargs, msg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/service.py", line 305, in invoke
response = self._invoke(method_name, kwargs, msg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/service.py", line 291, in _invoke
ret = method(**kwargs)
^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/service.py", line 758, in propagate_paths_and_modules
self.propagate_to(context, mitogen.core.to_text(path), overridden_source)
File "/home/user/.ansible/mitogen/mitogen/service.py", line 793, in propagate_to
self._forward(context, path)
File "/home/user/.ansible/mitogen/mitogen/service.py", line 712, in _forward
child = self.router.context_by_id(stream.protocol.remote_id)
^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'protocol'
^C[ERROR]: Unexpected Exception, this is probably a bug: process object is closed
Unexpected Exception, this is probably a bug.
<<< caused by >>>
process object is closed
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 386, in run
play_return = strategy.run(iterator, play_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 348, in run
return mitogen.core._profile_hook('Strategy',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/core.py", line 676, in _profile_hook
return func(*args)
^^^^^^^^^^^
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 349, in <lambda>
lambda: run(iterator, play_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/linear.py", line 195, in run
self._queue_task(host, task, task_vars, play_context)
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 319, in _queue_task
return super(StrategyMixin, self)._queue_task(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/__init__.py", line 376, in _queue_task
worker_prc = WorkerProcess(
^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/process/worker.py", line 86, in __init__
super(WorkerProcess, self).__init__()
File "/usr/lib/python3.11/multiprocessing/process.py", line 87, in __init__
self._parent_name = _current_process.name
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 189, in name
@property
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
if worker is None or not worker.is_alive():
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
self._check_closed()
File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
raise ValueError("process object is closed")
ValueError: process object is closed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 188, in run
result = self._tqm.run(play=play)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 389, in run
self._cleanup_processes()
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
if worker_prc and worker_prc.is_alive():
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
self._check_closed()
File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
raise ValueError("process object is closed")
ValueError: process object is closed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 660, in cli_executor
exit_code = cli.run()
^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/playbook.py", line 153, in run
results = pbex.run()
^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 252, in run
self._tqm.cleanup()
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 404, in cleanup
self._cleanup_processes()
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
if worker_prc and worker_prc.is_alive():
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
self._check_closed()
File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
raise ValueError("process object is closed")
ValueError: process object is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 669, in cli_executor
raise AnsibleError("Unexpected Exception, this is probably a bug.") from ex
ansible.errors.AnsibleError: Unexpected Exception, this is probably a bug: process object is closed
^CException ignored in: <function _releaseLock at 0x7f3aa8c75ee0>
Traceback (most recent call last):
File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
def _releaseLock():
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt:
skipping: [min]
skipping: [mohiam]
^C
^C[ERROR]: User interrupted execution
Process WorkerProcess-2202:
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 175, in wrap_worker__run
ansible_mitogen.affinity.policy.assign_worker()
File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 251, in assign_worker
self._balance('WorkerProcess')
File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 230, in _balance
self._set_cpu(descr, self._reserve_shift + (
File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 235, in _set_cpu
self._set_affinity(descr, 1 << (cpu % self.cpu_count))
File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 220, in _set_affinity
self._set_cpu_mask(mask)
File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 281, in _set_cpu_mask
_sched_setaffinity(tid, len(s), s)
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
if worker is None or not worker.is_alive():
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
^C[ERROR]: User interrupted execution
[ERROR]: [mux 2257952] 10:05:09.236672 E mitogen.service: Pool(cb10, size=32, th='mitogen.Pool.cb10.19'): while invoking 'get' of 'ansible_mitogen.services.ContextService'
Traceback (most recent call last):
File "/home/user/.ansible/mitogen/mitogen/service.py", line 619, in _on_service_call
return invoker.invoke(method_name, kwargs, msg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/service.py", line 307, in invoke
msg.reply(response)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 962, in reply
(self.router or router).route(msg)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 3549, in route
self.broker.defer(self._async_route, msg)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 3029, in defer
raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
[ERROR]: [mux 2257952] 10:05:09.237781 E mitogen.service: While handling Message(0, 136349, 0, 110, 1000, b"\x80\x02X'\x00\x00\x00ansible_mitogen.services.ContextServiceq\x00X\x03"..696) using <bound method Pool._on_service_call of Pool(cb10, size=32, th='mitogen.Pool.cb10.19')>
Traceback (most recent call last):
File "/home/user/.ansible/mitogen/mitogen/service.py", line 619, in _on_service_call
return invoker.invoke(method_name, kwargs, msg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/service.py", line 307, in invoke
msg.reply(response)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 962, in reply
(self.router or router).route(msg)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 3549, in route
self.broker.defer(self._async_route, msg)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 3029, in defer
raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/.ansible/mitogen/mitogen/service.py", line 644, in _worker_run
func(event)
File "/home/user/.ansible/mitogen/mitogen/service.py", line 628, in _on_service_call
msg.reply(mitogen.core.CallError(e))
File "/home/user/.ansible/mitogen/mitogen/core.py", line 962, in reply
(self.router or router).route(msg)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 3549, in route
self.broker.defer(self._async_route, msg)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 3029, in defer
raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
[ERROR]: [mux 2257952] 10:05:09.238474 E mitogen.service: Pool(cb10, size=32, th='mitogen.Pool.cb10.6'): while invoking 'get' of 'ansible_mitogen.services.ContextService'
Traceback (most recent call last):
File "/home/user/.ansible/mitogen/mitogen/service.py", line 619, in _on_service_call
return invoker.invoke(method_name, kwargs, msg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/service.py", line 307, in invoke
msg.reply(response)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 962, in reply
(self.router or router).route(msg)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 3549, in route
self.broker.defer(self._async_route, msg)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 3029, in defer
raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
[ERROR]: [mux 2257952] 10:05:09.238786 E mitogen.service: While handling Message(0, 137351, 0, 110, 1000, b"\x80\x02X'\x00\x00\x00ansible_mitogen.services.ContextServiceq\x00X\x03"..696) using <bound method Pool._on_service_call of Pool(cb10, size=32, th='mitogen.Pool.cb10.6')>
Traceback (most recent call last):
File "/home/user/.ansible/mitogen/mitogen/service.py", line 619, in _on_service_call
return invoker.invoke(method_name, kwargs, msg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/service.py", line 307, in invoke
msg.reply(response)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 962, in reply
(self.router or router).route(msg)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 3549, in route
self.broker.defer(self._async_route, msg)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 3029, in defer
raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/.ansible/mitogen/mitogen/service.py", line 644, in _worker_run
func(event)
File "/home/user/.ansible/mitogen/mitogen/service.py", line 628, in _on_service_call
msg.reply(mitogen.core.CallError(e))
File "/home/user/.ansible/mitogen/mitogen/core.py", line 962, in reply
(self.router or router).route(msg)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 3549, in route
self.broker.defer(self._async_route, msg)
File "/home/user/.ansible/mitogen/mitogen/core.py", line 3029, in defer
raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
Thank you for your great work on mitogen, it is very much appreciated 👏
I was not able to reproduce anymore busy looping (red) processes using 9cc0ab0, from #1349 cherry picked on top of most recent master fdb5c62.
I don't think you can use just 9cc0ab0, on top of fdb5c62. You wil also need 4e86cf448e38897195c8193f9ca17dd9e6774ab5, it adds support for stripping comments inside _first_stage(), and once you have both those commits you might as well use 9cc0ab0 (HEAD of that branch) directly.
I don't think you can use just 9cc0ab0, on top of fdb5c62. You wil also need 4e86cf4, it adds support for stripping comments inside
_first_stage(), and once you have both those commits you might as well use 9cc0ab0 (HEAD of that branch) directly.
I tested again as you suggested with the same result: Leftover idle processes on the control node, the remote processes terminated after I killed the processes on my control node host.
Some errors encountered during Ctrl-C abort testing
^C[ERROR]: Unexpected Exception, this is probably a bug: process object is closed
Unexpected Exception, this is probably a bug.
<<< caused by >>>
process object is closed
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 386, in run
play_return = strategy.run(iterator, play_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 348, in run
return mitogen.core._profile_hook('Strategy',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/core.py", line 676, in _profile_hook
return func(*args)
^^^^^^^^^^^
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 349, in <lambda>
lambda: run(iterator, play_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/linear.py", line 195, in run
self._queue_task(host, task, task_vars, play_context)
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 319, in _queue_task
return super(StrategyMixin, self)._queue_task(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/__init__.py", line 376, in _queue_task
worker_prc = WorkerProcess(
^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/process/worker.py", line 101, in __init__
self.worker_queue = WorkerQueue(ctx=multiprocessing_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/queues.py", line 48, in __init__
self._wlock = ctx.Lock()
^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/context.py", line 68, in Lock
return Lock(ctx=self.get_context())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 162, in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 57, in __init__
sl = self._semlock = _multiprocessing.SemLock(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
if worker is None or not worker.is_alive():
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
self._check_closed()
File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
raise ValueError("process object is closed")
ValueError: process object is closed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 188, in run
result = self._tqm.run(play=play)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 389, in run
self._cleanup_processes()
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
if worker_prc and worker_prc.is_alive():
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
self._check_closed()
File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
raise ValueError("process object is closed")
ValueError: process object is closed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 660, in cli_executor
exit_code = cli.run()
^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/playbook.py", line 153, in run
results = pbex.run()
^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 252, in run
self._tqm.cleanup()
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 404, in cleanup
self._cleanup_processes()
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
if worker_prc and worker_prc.is_alive():
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
self._check_closed()
File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
raise ValueError("process object is closed")
ValueError: process object is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 669, in cli_executor
raise AnsibleError("Unexpected Exception, this is probably a bug.") from ex
ansible.errors.AnsibleError: Unexpected Exception, this is probably a bug: process object is closed
^CException ignored in: <function _releaseLock at 0x7fd65bb79ee0>
Traceback (most recent call last):
File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
def _releaseLock():
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
raise KeyboardIn
^CException ignored in: <function _releaseLock at 0x7fe529385ee0>
Traceback (most recent call last):
File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
def _releaseLock():
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt:
ok: [phd-test-aa2]
ok: [phd-test-san09]
[ERROR]: A worker was found in a dead state
^CException ignored in: <function _releaseLock at 0x7f315a921ee0>
Traceback (most recent call last):
File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
def _releaseLock():
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt:
[ERROR]: A worker was found in a dead state
[ERROR]: [mux 3622647] 10:56:31.056254 E mitogen: Broker(da90): pending work still existed 5 seconds after shutdown began. This may be due to a timer that is yet to expire, or a child connection that did not fully shut down.
^CProcess WorkerProcess-317:
[ERROR]: User interrupted execution
Traceback (most recent call last):
File "/usr/lib/python3.11/weakref.py", line 214, in items
v = wr()
^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
if worker is None or not worker.is_alive():
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 307, in _bootstrap
self._after_fork()
File "/usr/lib/python3.11/multiprocessing/process.py", line 342, in _after_fork
util._run_after_forkers()
File "/usr/lib/python3.11/multiprocessing/util.py", line 163, in _run_after_forkers
items = list(_afterfork_registry.items())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/weakref.py", line 212, in items
with _IterationGuard(self):
File "/usr/lib/python3.11/_weakrefset.py", line 27, in __exit__
def __exit__(self, e, t, b):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
if worker is None or not worker.is_alive():
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
^CException ignored in: <function _releaseLock at 0x7f74e1385ee0>
Traceback (most recent call last):
File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
def _releaseLock():
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt:
^CException ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7fb708c2f2e0>
Traceback (most recent call last):
File "/usr/lib/python3.11/weakref.py", line 105, in remove
def remove(wr, selfref=ref(self), _atomic_removal=_remove_dead_weakref):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt:
[ERROR]: Task failed: Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected
Task failed.
Origin: /home/user/git/ansible-repo/roles-shared/network/tasks/systemd.yml:5:5
3 block:
4
5 - name: disable service NetworkManager
^ column 5
<<< caused by >>>
Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected
fatal: [phd-test-aa2]: UNREACHABLE! =>
changed: false
msg: 'Task failed: Mitogen was disconnected from the remote environment while a call
was in-progress. If you feel this is in error, please file a bug. Original error
was: the respondent Context has disconnected'
unreachable: true
^C[ERROR]: Unexpected Exception, this is probably a bug: process object is closed
Unexpected Exception, this is probably a bug.
<<< caused by >>>
process object is closed
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 386, in run
play_return = strategy.run(iterator, play_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 348, in run
return mitogen.core._profile_hook('Strategy',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.ansible/mitogen/mitogen/core.py", line 676, in _profile_hook
return func(*args)
^^^^^^^^^^^
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 349, in <lambda>
lambda: run(iterator, play_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/linear.py", line 195, in run
self._queue_task(host, task, task_vars, play_context)
File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 319, in _queue_task
return super(StrategyMixin, self)._queue_task(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/__init__.py", line 376, in _queue_task
worker_prc = WorkerProcess(
^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/process/worker.py", line 101, in __init__
self.worker_queue = WorkerQueue(ctx=multiprocessing_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/queues.py", line 43, in __init__
self._rlock = ctx.Lock()
^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/context.py", line 68, in Lock
return Lock(ctx=self.get_context())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 162, in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 58, in __init__
kind, value, maxvalue, self._make_name(),
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 117, in _make_name
next(SemLock._rand))
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/tempfile.py", line 293, in __next__
return ''.join(self.rng.choices(self.characters, k=8))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/random.py", line 493, in choices
return [population[floor(random() * n)] for i in _repeat(None, k)]
^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
if worker is None or not worker.is_alive():
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
self._check_closed()
File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
raise ValueError("process object is closed")
ValueError: process object is closed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 188, in run
result = self._tqm.run(play=play)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 389, in run
self._cleanup_processes()
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
if worker_prc and worker_prc.is_alive():
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
self._check_closed()
File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
raise ValueError("process object is closed")
ValueError: process object is closed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 660, in cli_executor
exit_code = cli.run()
^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/playbook.py", line 153, in run
results = pbex.run()
^^^^^^^^^^
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 252, in run
self._tqm.cleanup()
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 404, in cleanup
self._cleanup_processes()
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
if worker_prc and worker_prc.is_alive():
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
self._check_closed()
File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
raise ValueError("process object is closed")
ValueError: process object is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 669, in cli_executor
raise AnsibleError("Unexpected Exception, this is probably a bug.") from ex
^CException ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7f4199eef2e0>
Traceback (most recent call last):
File "/usr/lib/python3.11/weakref.py", line 105, in remove
def remove(wr, selfref=ref(self), _atomic_removal=_remove_dead_weakref):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt:
[ERROR]: Task failed: Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected
Task failed.
Origin: /home/user/git/ansible-repo/roles-shared/monitoring/tasks/xymon_client.yml:1:3
1 - name: install xymon-client dependencies
^ column 3
<<< caused by >>>
Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected
^CException ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7ff282b1b2e0>
Traceback (most recent call last):
File "/usr/lib/python3.11/weakref.py", line 105, in remove
def remove(wr, selfref=ref(self), _atomic_removal=_remove_dead_weakref):
File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt:
ok: [pf-pc20]
[ERROR]: A worker was found in a dead state
[ERROR]: User interrupted execution
[ERROR]: [mux 3735374] 11:07:57.514282 E mitogen.unix: listener: failed to assign identity to PID 3798448: [Errno 32] Broken pipe
No, we removed Mitogen as we had already restructured our Ansible playbooks to run in separate pipeline jobs.
But there still were some leftover idle processes on 19 remote hosts, which were all connected to processes on my control node host Leftover idle processes Output of
ps -eo cputimes,stime,pid,ppid,user:12,args:0 09:48 1478214 1478210 root /usr/bin/python3(mitogen:user@host:2203330) 0 09:48 1478217 1478214 root /usr/bin/python3(mitogen:user@host:2203330) 0 09:48 1478228 1478224 root /usr/bin/python3(mitogen:user@host:2203330) ...
Contemplating why these (fork parent that execed python) processes are left running. I expected they would exit when the fork child was terminated by the unhandled signal hence the write side of both pipes would be closed, and stdin of the Python process would retun EOF.
@moreati It's only a PoC, but this should at least handle the EOF and avoid the hanging process with 100%.
https://github.com/mhartmay/mitogen/commit/7ae83cd5d32c67ac8f06ffefcb40e1399eab5f93
@rda0 Can you please check whether my fix [1] works?
[1] https://github.com/mitogen-hq/mitogen/pull/1389
@moreati It's only a PoC, but this should at least handle the EOF and avoid the hanging process with 100%.
@rda0 Can you please check whether my fix [1] works?
[1] #1389
@mhartmay I am now using your fix since about a week and I have seen no more leftover processes 👍
@moreati It's only a PoC, but this should at least handle the EOF and avoid the hanging process with 100%. mhartmay@7ae83cd @rda0 Can you please check whether my fix [1] works? [1] #1389
@mhartmay I am now using your fix since about a week and I have seen no more leftover processes 👍
Thanks a ton for the feedback!
Hopefully my PR gets merged, but we still need to figure why the the EOF occurs before all the data has been read.
Mitogen 0.3.36 includes Marc's fix for this.