mitogen icon indicating copy to clipboard operation
mitogen copied to clipboard

hanging process with 100% CPU since 0.3.28

Open DanielRaapDev opened this issue 2 months ago • 20 comments

Using Mitogen 0.3.29 we observed some machines having 100% CPU usage on one core. It is a python process of Mitogen parent that hangs. When our pipelines run Ansible multiple times there may exist multiple instances of such a process increasing CPU load even more.

After some debugging we found that the loop of this code is running - never ending.

while PREAMBLE_COMPRESSED_LEN-len(C)and select.select([0],[],[]):C+=os.read(0,PREAMBLE_COMPRESSED_LEN-len(C))

A strace showed the process is calling this over and over again:

read(0, "", 18277)                      = 0
pselect6(1, [0], NULL, NULL, NULL, NULL) = 1 (in [0])
read(0, "", 18277)                      = 0
pselect6(1, [0], NULL, NULL, NULL, NULL) = 1 (in [0])

This was not happening on all machines nor for each run of our Ansible pipeline.

Workaround: After reverting back to Mitogen 0.3.27 no more problems where seen.

I think this is related to the changes in #1307. Maybe because we don't use sudo logging?!

Setup: We use SSH with an unprivileged user and use passwordless sudo to become root. We execute Ansible in a docker container running an execution environment. So each run uses exactly the same environment.

requirements.txt:

ansible-lint==25.9.2
ansible-navigator==25.9.0
jmespath==1.0.1
lxml==6.0.2
passlib==1.7.4
sarif-tools==3.0.5
mitogen==0.3.29

DanielRaapDev avatar Oct 30 '25 09:10 DanielRaapDev

For completeness, what is the OS and Python version on the affected machines?

moreati avatar Oct 30 '25 09:10 moreati

Notes to self

pselect6(1, [0], NULL, NULL, NULL, NULL) = 1 (in [0])

pselect6() asked if a single fd (fd=0) was ready to read (such that read() would not block), return values indicate that single fd=0 is ready

read(0, "", 18277) = 0

Attempted to read upto 18277 bytes, returned (i.e. didn't block) with 0 bytes.

moreati avatar Oct 30 '25 09:10 moreati

Machines are running Ubuntu 24.04 LTS and Debian 12. Python is Python 3.12.3 or Python 3.11.2. .

DanielRaapDev avatar Oct 30 '25 09:10 DanielRaapDev

Brainstorming possibilities (with input from claude.ai), in roughly decending order of likelyhood by my gut feeling

  1. EOF - remote end of pipe/socket that feeds fd=0 closed, or was reset
  2. Race condition - something else read the data, between select() returning and calling read()
    1. another part of Mitogen
    2. something in Ansible
    3. something in PAM, ssh, or sudo (e.g. a plugin)
    4. wildcard - e.g. third-party audit, security tool, agent, etc.
  3. select() false positive (I'm skeptical)

moreati avatar Oct 30 '25 11:10 moreati

  1. EOF - remote end of pipe/socket that feeds fd=0 closed, or was reset

If this is the case, does the parent notice? Does it do anything about it? Can it?

@DanielRaapDev

  1. When you observed processes at 100% CPU were any failures or retries reported on the controller?
  2. Do you have any logs or stdout from such an incident?
  3. As a rough estimate how often did you observe it? E.g. 50% of playbook executions, once per 1000 executions
  4. Do you have anything out of the ordinary in your authentication/authorization stack? E.g. single sign on, policy agents, endpoint managment, custom SSH config, sudo plugins, PAM plugins

moreati avatar Oct 30 '25 15:10 moreati

Rather than diagnose/fix the exact cause, fixing the symptom may be more robust. E.g. have the first stage self-destruct if it hasn't reached the second stage within N seconds.

moreati avatar Oct 30 '25 15:10 moreati

  1. Good that you ask. I looked in the old logs and found some error:
PLAY [name of our play] *****************
[ERROR]: [mux  38] 09:13:08.341470 E mitogen.[ssh.billing-test.buero.subshell.io]: while importing 'ansible.module_utils.json_utils'
Traceback (most recent call last):
  File "<stdin>", line 1674, in exec_module
  File "master:/usr/local/lib/python3.14/site-packages/ansible/module_utils/json_utils.py", line 27
SyntaxError: future feature annotations is not defined

But the increased CPU usage was at a later run when no such error was in the log 🤔

  1. No uncommon output here. Our check runs are very minimal due symmary output only. See ansible.cfg below.

  2. About 50-70% of machines were affected. But only about maybe 5-10% of the Jobs left such a process behind. So there must be a race condition be involved.

  3. No fancy auth here, just SSH pub key with local key files. See ansible.cfg above for custom settings.

[defaults]
interpreter_python=auto_silent
inventory=inventory/hosts.yml
remote_user=sa_ansible
callbacks_enabled = ansible.posix.profile_tasks,ansible.posix.profile_roles
forks=32
# disable HostKeyChecking so Mitogen works on Jenkins Hosts
host_key_checking=False
gathering=smart

[privilege_escalation]
# Default to sudo:
become=True

[ssh_connection]
# Keep SSH parameters in place, so Ansible execution without Mitogen will use better SSH connection
ssh_args = -o ControlMaster=auto -o ControlPersist=600 -o PreferredAuthentications=publickey
# keeping this empty prevents too long unix_socket path error
control_path =
pipelining = True

[callback_profile_roles]
summary_only = True

[callback_profile_tasks]
summary_only = True

Btw. most Jobs only run in check mode.

DanielRaapDev avatar Oct 30 '25 15:10 DanielRaapDev

Rather than diagnose/fix the exact cause, fixing the symptom may be more robust. E.g. have the first stage self-destruct if it hasn't reached the second stage within N seconds.

Yeah, the synchronous call prior to #1307 did not had this issue. So something with the new way of reading the data fails where the previous detected that case! or it never happend?

DanielRaapDev avatar Oct 30 '25 15:10 DanielRaapDev

Thanks for your work :)

DanielRaapDev avatar Oct 30 '25 15:10 DanielRaapDev

Yeah, the synchronous call prior to #1307 did not had this issue. So something with the new way of reading the data fails where the previous detected that case! or it never happend?

Previously it was using a single shot fp.read(N). So if the input was truncated then zlib.decompress() would have most likely have thrown an unhandled exception, thus the process(es) would have exited with a non-zerro status.

moreati avatar Oct 30 '25 15:10 moreati

Please could you try https://github.com/mitogen-hq/mitogen/commit/9cc0ab0823b0280f5309d0f102a42f2cceb99b57, from #1349

moreati avatar Nov 05 '25 20:11 moreati

I am able to somewhat reproduce this by aborting ansible-playbook runs (against ~200 hosts) with Ctrl-C.

There seem to be 2 types of "stuck" processes:

  • &yellow: idle processes
  • &red: busy-looping processes

Due to this issue I have some monitoring data to share, which shows some of the problematic processes:

stuck process examples
color   cputime runtime        pid       user  cmd
&red       0.1h    0.1h     279437       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
<h2>ansible</h2><p>&yellow        0.0h      162.3h     732268     732262       root /usr/bin/python3( mitogen:user@host:3700546)
&red         153.1h      162.3h     732269     732268       root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red       0.3h    0.3h    4104182       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
<h2>stuck processes</h2><p>&red 150.6302777777778h  >10     732269     732268       root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&red       0.2h    0.2h      35960       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
<h2>ansible</h2><p>&yellow        0.0h  >10     732268     732262       root /usr/bin/python3( mitogen:user@host:3700546)
&red    152.79694444444445h  >10     732269     732268       root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h   90.6h     348812       root  /usr/bin/python3( mitogen:user@host:712040)
&red      78.3h   90.6h     348813       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red       0.1h    0.1h     663867       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&red       0.0h    0.1h    2463164       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h   90.6h    2248518       root  /usr/bin/python3( mitogen:user@host:711582)
&red      78.4h   90.6h    2248519       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red       0.1h    0.1h     876922       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&yellow    0.0h 1022.4h    3222703       root  /usr/bin/python3( mitogen:user@host:2542363)
&red    1011.7h 1022.4h    3222704       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h  162.6h    3688708       root  /usr/bin/python3( mitogen:user@host:3700546)
&red     153.4h  162.6h    3688709       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h  666.7h    2786038       root  /usr/bin/python3( mitogen:user@host:756221)
&red     651.3h  666.7h    2786039       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&red       0.1h    0.1h     469758       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&yellow    0.0h  666.7h    2038373       root  /usr/bin/python3( mitogen:user@host:757665)
&red     652.1h  666.7h    2038374       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h   90.6h    1572053       root  /usr/bin/python3( mitogen:user@host:711582)
&red      78.3h   90.6h    1572055       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h  666.7h     830797       root  /usr/bin/python3( mitogen:user@host:756221)
&red     652.2h  666.7h     830798       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&red       0.2h    0.3h    3344750       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h   38.0h    3374195       root  /usr/bin/python3( mitogen:user@host:3092216)
&red      18.8h   38.0h    3374196       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h   38.5h    3374195       root  /usr/bin/python3( mitogen:user@host:3092216)
&red      19.3h   38.5h    3374196       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h   38.8h    2528754       root  /usr/bin/python3( mitogen:user@host:3092216)
&red      18.8h   38.8h    2528755       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h    1.7h    1100427       root  /usr/bin/python3( mitogen:user@host:668520)
&yellow    0.0h    1.7h    1100432       root  /usr/bin/python3( mitogen:user@host:668520)
&red       0.1h    0.1h    1114730       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red       0.0h    0.0h    3374196       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h  666.7h    1826395       root  /usr/bin/python3( mitogen:user@host:757665)
&red     632.1h  666.7h    1826396       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red       0.2h    0.2h    1131234       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&yellow    0.0h  858.6h    3021055       root  /usr/bin/python3( mitogen:user@host:57150)
&red     845.8h  858.6h    3021056       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h    1.2h    1100427       root  /usr/bin/python3( mitogen:user@host:668520)
&yellow    0.0h    1.2h    1100432       root  /usr/bin/python3( mitogen:user@host:668520)
&yellow    0.0h  666.7h    3954410       root  /usr/bin/python3( mitogen:user@host:756221)
&red     649.3h  666.7h    3954411       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h  162.6h    1483873       root  /usr/bin/python3( mitogen:user@host:3700546)
&red     153.4h  162.6h    1483874       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&yellow    0.0h  714.6h    2180811       root  /usr/bin/python3( mitogen:user@host:7664)
&red     707.0h  714.6h    2180812       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h   38.4h     698811       root  /usr/bin/python3( mitogen:user@host:3092216)
&red      19.3h   38.4h     698812       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h  782.8h    1569058       root  /usr/bin/python3( mitogen:user@host:756221)
&red     769.2h  782.8h    1569059       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h  714.6h    1081190       root  /usr/bin/python3( mitogen:user@host:7664)
&red     706.9h  714.6h    1081191       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&yellow    0.0h  666.7h    1012717       root  /usr/bin/python3( mitogen:user@host:756221)
&red     644.1h  666.7h    1012718       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&yellow    0.0h  162.6h    3905890       root  /usr/bin/python3( mitogen:user@host:3700546)
&red     153.4h  162.6h    3905891       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h  714.6h     805821       root  /usr/bin/python3( mitogen:user@host:7664)
&red     706.7h  714.6h     805822       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red       0.2h    0.2h     987350       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red       0.2h    0.2h    1409297       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h   90.6h    1874827       root  /usr/bin/python3( mitogen:user@host:711582)
&red      78.4h   90.6h    1874828       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red       0.1h    0.1h    4168476       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red       0.0h    0.1h    1463751       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&red       0.3h    0.3h    2978525       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>
&yellow    0.0h  162.6h     969013       root  /usr/bin/python3( mitogen:user@host:3700546)
&red     153.4h  162.6h     969014       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))
&yellow    0.0h  666.7h     291047       root  /usr/bin/python3( mitogen:user@host:756221)
&red     646.2h  666.7h     291048       root  /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))

Usually when Ctrl-C-ing it immediately aborts with ^C[ERROR]: User interrupted execution. But sometimes mitogen throws red walls of errors or exceptions and I had to hit Ctrl-C multiple times. Rarely I had to kill the ansible-playbook processes because they never aborted even after many Ctrl-Cs.

Please could you try 9cc0ab0, from #1349

I am now testing your patch and will report back later.

rda0 avatar Nov 07 '25 08:11 rda0

&yellow 0.0h 1.2h 1100427 root /usr/bin/python3( mitogen:user@host:668520)

All the yellow (ide processes) are fork parents with argv /path/to/python(<mitogen parent>) - they've exec()d a fresh Python, and it is waiting for code to execute on it's stdin, which is connected to a pipe shared with with its fork child.

&red 1011.7h 1022.4h 3222704 root /usr/bin/python3 -c import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,zlib;exec(zlib.decompress(binascii.a2b_base64\("xxx")))</p>

All the red processes (busy loop, presumed read/select) are fork children. These will be subject to the new timeout.

If the timeout fires it should terminate the fork child (red), closing the pipe file descriptors it holds, in turn causing the fork parent's stdin to close, in turn causing that Python to exit.

moreati avatar Nov 07 '25 09:11 moreati

I was not able to reproduce anymore busy looping (red) processes using 9cc0ab0, from https://github.com/mitogen-hq/mitogen/pull/1349 cherry picked on top of most recent master fdb5c625.

But there still were some leftover idle processes on 19 remote hosts, which were all connected to processes on my control node host (0 09:52 2227217 2211716 user ssh -o LogLevel ERROR -l root -o Compression yes -o ServerAliveInterval 30 -o ServerAliveCountMax 10 -o BatchMode yes -o StrictHostKeyChecking yes -o ControlMaster=auto -o PreferredAuthentications=publickey -o ServerAliveInterval=30 -F .ssh_config remote-host /usr/bin/python3 -c 'import sys;sys.path=[p for p in sys.path if p];import binascii,os,select,signal,zlib;exec(zlib.decompress(binascii.a2b_base64("xxx")))):

Leftover idle processes

Output of ps -eo cputimes,stime,pid,ppid,user:12,args:

       0 09:48 1478214 1478210 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1478217 1478214 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1478228 1478224 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1478231 1478228 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1476516 1476512 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1476519 1476516 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1481142 1481138 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1481145 1481142 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1255576 1255572 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1255579 1255576 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1261001 1260997 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1261004 1261001 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1147805 1147801 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1147808 1147805 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1260701 1260697 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1260704 1260701 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1261065 1261061 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1261068 1261065 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1260123 1260119 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1260126 1260123 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1260560 1260556 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1260563 1260560 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1259975 1259971 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1259978 1259975 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1261033 1261029 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:48 1261036 1261033 root         /usr/bin/python3(mitogen:user@host:2203330)
       0 09:50 1664891 1664887 root         /usr/bin/python3(mitogen:user@host:2206537)
       0 09:50 1664894 1664891 root         /usr/bin/python3(mitogen:user@host:2206537)
       0 09:50 1983643 1983639 root         /usr/bin/python3(mitogen:user@host:2206537)
       0 09:50 1983646 1983643 root         /usr/bin/python3(mitogen:user@host:2206537)
       0 09:50 1302747 1302743 root         /usr/bin/python3(mitogen:user@host:2206537)
       0 09:50 1302750 1302747 root         /usr/bin/python3(mitogen:user@host:2206537)
       0 09:50 2759687 2759683 root         /usr/bin/python3(mitogen:user@host:2206537)
       0 09:50 2759690 2759687 root         /usr/bin/python3(mitogen:user@host:2206537)

The remote processes terminated after I killed the processes on my control node host.

Some errors encountered during Ctrl-C abort testing
^CException ignored in: <function _after_fork at 0x7f33765f3600>
Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1638, in _after_fork
[ERROR]: User interrupted execution
Process WorkerProcess-134:
Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/process/worker.py", line 159, in _detach
    os.setsid()
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
    if worker is None or not worker.is_alive():
                             ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 176, in wrap_worker__run
    return mitogen.core._profile_hook('WorkerProcess',
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 676, in _profile_hook
    return func(*args)
           ^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 177, in <lambda>
    lambda: worker__run(self)
            ^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/process/worker.py", line 192, in run
    self._detach()
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/process/worker.py", line 177, in _detach
    display.error(f'Could not detach from stdio: {e}')
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
    if worker is None or not worker.is_alive():
                             ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
TASK [Gathering Facts] ************************************************************************************************
^CException ignored in: <function _releaseLock at 0x7f7ac756dee0>
Traceback (most recent call last):
  File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
    def _releaseLock():
    
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
    raise KeyboardInterrupt()
KeyboardInterrupt: 
[ERROR]: [mux  2168469] 09:25:53.157721 E mitogen.unix: listener: failed to assign identity to PID 2168543: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.162541 E mitogen.unix: listener: failed to assign identity to PID 2168544: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.178098 E mitogen.unix: listener: failed to assign identity to PID 2168548: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.178351 E mitogen.unix: listener: failed to assign identity to PID 2168551: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.178432 E mitogen.unix: listener: failed to assign identity to PID 2168556: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.178493 E mitogen.unix: listener: failed to assign identity to PID 2168557: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.178551 E mitogen.unix: listener: failed to assign identity to PID 2168560: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.178612 E mitogen.unix: listener: failed to assign identity to PID 2168563: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.178694 E mitogen.unix: listener: failed to assign identity to PID 2168566: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.178746 E mitogen.unix: listener: failed to assign identity to PID 2168569: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.178792 E mitogen.unix: listener: failed to assign identity to PID 2168574: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.178840 E mitogen.unix: listener: failed to assign identity to PID 2168575: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.178885 E mitogen.unix: listener: failed to assign identity to PID 2168578: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.178931 E mitogen.unix: listener: failed to assign identity to PID 2168581: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.178975 E mitogen.unix: listener: failed to assign identity to PID 2168584: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.179019 E mitogen.unix: listener: failed to assign identity to PID 2168587: [Errno 32] Broken pipe
[ERROR]: [mux  2168469] 09:25:53.179061 E mitogen.unix: listener: failed to assign identity to PID 2168590: [Errno 32] Broken pipe
[ERROR]: Task failed: Connection timed out.
fatal: [space-pc102]: UNREACHABLE! => 
    changed: false
    msg: 'Task failed: Connection timed out.'
    unreachable: true
ok: [guenther41]
^C[ERROR]: User interrupted execution
[ERROR]: [mux  2167350] 09:25:46.890245 E mitogen: Broker(d290): pending work still existed 5 seconds after shutdown began. This may be due to a timer that is yet to expire, or a child connection that did not fully shut down.
TASK [Gathering Facts] ************************************************************************************************
[ERROR]: [mux  2171668] 09:36:07.256212 E mitogen: Broker(fed0): pending work still existed 5 seconds after shutdown began. This may be due to a timer that is yet to expire, or a child connection that did not fully shut down.
^C[ERROR]: User interrupted execution
TASK [Gathering Facts] ************************************************************************************************
^C[ERROR]: User interrupted execution
Process WorkerProcess-25:
Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 175, in wrap_worker__run
    ansible_mitogen.affinity.policy.assign_worker()
  File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 251, in assign_worker
    self._balance('WorkerProcess')
  File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 230, in _balance
    self._set_cpu(descr, self._reserve_shift + (
  File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 235, in _set_cpu
    self._set_affinity(descr, 1 << (cpu % self.cpu_count))
  File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 220, in _set_affinity
    self._set_cpu_mask(mask)
  File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 281, in _set_cpu_mask
    _sched_setaffinity(tid, len(s), s)
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
    if worker is None or not worker.is_alive():
                             ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 317, in _bootstrap
    util._exit_function()
  File "/usr/lib/python3.11/multiprocessing/util.py", line 320, in _exit_function
    def _exit_function(info=info, debug=debug, _run_finalizers=_run_finalizers,
    
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
    if worker is None or not worker.is_alive():
                             ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
^C
^C[ERROR]: Unexpected Exception, this is probably a bug: process object is closed

Unexpected Exception, this is probably a bug.

<<< caused by >>>

process object is closed

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 386, in run
    play_return = strategy.run(iterator, play_context)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 348, in run
    return mitogen.core._profile_hook('Strategy',
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 676, in _profile_hook
    return func(*args)
           ^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 349, in <lambda>
    lambda: run(iterator, play_context)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/linear.py", line 195, in run
    self._queue_task(host, task, task_vars, play_context)
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 319, in _queue_task
    return super(StrategyMixin, self)._queue_task(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/__init__.py", line 376, in _queue_task
    worker_prc = WorkerProcess(
                 ^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/process/worker.py", line 101, in __init__
    self.worker_queue = WorkerQueue(ctx=multiprocessing_context)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/queues.py", line 43, in __init__
    self._rlock = ctx.Lock()
                  ^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/context.py", line 68, in Lock
    return Lock(ctx=self.get_context())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 162, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
  File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 58, in __init__
    kind, value, maxvalue, self._make_name(),
                           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 117, in _make_name
    next(SemLock._rand))
    ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/tempfile.py", line 293, in __next__
    return ''.join(self.rng.choices(self.characters, k=8))
                   ^^^^^^^^
  File "/usr/lib/python3.11/tempfile.py", line 281, in rng
    @property
    
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
    if worker is None or not worker.is_alive():
                             ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
    self._check_closed()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
    raise ValueError("process object is closed")
ValueError: process object is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 188, in run
    result = self._tqm.run(play=play)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 389, in run
    self._cleanup_processes()
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
    if worker_prc and worker_prc.is_alive():
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
    self._check_closed()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
    raise ValueError("process object is closed")
ValueError: process object is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 660, in cli_executor
    exit_code = cli.run()
                ^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/playbook.py", line 153, in run
    results = pbex.run()
              ^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 252, in run
    self._tqm.cleanup()
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 404, in cleanup
    self._cleanup_processes()
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
    if worker_prc and worker_prc.is_alive():
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
    self._check_closed()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
    raise ValueError("process object is closed")
ValueError: process object is closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 669, in cli_executor
    raise AnsibleError("Unexpected Exception, this is probably a bug.") from ex
ansible.errors.AnsibleError: Unexpected Exception, this is probably a bug: process object is closed
^CException ignored in: <function _releaseLock at 0x7f528716dee0>
Traceback (most recent call last):
  File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
    def _releaseLock():
    
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
    raise KeyboardInterrupt()
KeyboardInterrupt: 
ok: [guenther10]
^C
-rda:~/git/ansible-repo[±]$ [ERROR]: [mux  2203330] 09:48:52.032133 E mitogen.service: Pool(8ed0, size=32, th='mitogen.Pool.8ed0.14'): while invoking 'propagate_paths_and_modules' of 'mitogen.service.PushFileService'
Traceback (most recent call last):
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 619, in _on_service_call
    return invoker.invoke(method_name, kwargs, msg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 305, in invoke
    response = self._invoke(method_name, kwargs, msg)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 291, in _invoke
    ret = method(**kwargs)
          ^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 758, in propagate_paths_and_modules
    self.propagate_to(context, mitogen.core.to_text(path), overridden_source)
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 793, in propagate_to
    self._forward(context, path)
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 712, in _forward
    child = self.router.context_by_id(stream.protocol.remote_id)
                                      ^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'protocol'

^C
^C[ERROR]: Task failed: Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected

Task failed.
Origin: /home/user/git/ansible-repo/roles-shared/blockdev/tasks/lvm.yml:50:3

48     label: "vg: {{ lv.0.name }}, {{ lv.1 }}"
49
50 - name: deploy lv filesystems
     ^ column 3

<<< caused by >>>

Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected

failed: [funkadelic] (item=vg: vg0, {'key': 'tmp', 'value': {'mount': '/tmp', 'size': '10G', 'mode': '1777'}}) => 
    ansible_loop_var: lv
    changed: false
    lv:
    -   lvs:
        -   key: root
            value:
                mount: /
                mount_options: noatime,nodiratime,errors=remount-ro
                mount_pass: '1'
                size: 10G
        -   key: log
            value:
                mount: /var/log
                size: 2G
        -   key: tmp
            value:
                mode: '1777'
                mount: /tmp
                size: 10G
        -   key: swap
            value:
                fs: swap
                mount: none
                mount_options: defaults
                mount_pass: '0'
                size: 4G
        -   key: scr
            value:
                fs_options: -m 0
                group: 1893
                mode: '1770'
                mount: /scratch
                owner: 1893
                size: 30G
        -   key: home
            value:
                mount: /home
                size: 30G
        name: vg0
        pv_id: ata-APPLE_SSD_SM0256G_S2PANYAGB03426-part2
    -   key: tmp
        value:
            mode: '1777'
            mount: /tmp
            size: 10G
    msg: 'Task failed: Mitogen was disconnected from the remote environment while a call
        was in-progress. If you feel this is in error, please file a bug. Original error
        was: the respondent Context has disconnected'
    unreachable: true
[ERROR]: User interrupted execution
^CException ignored in: <function _releaseLock at 0x7f5250f65ee0>
Traceback (most recent call last):
  File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
    def _releaseLock():
    
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
    raise KeyboardInterrupt()
KeyboardInterrupt: 
[ERROR]: Task failed: Channel was disconnected while connection attempt was in progress; this may be caused by an abnormal Ansible exit, or due to an unreliable target.
fatal: [guenther55]: UNREACHABLE! => 
    changed: false
    msg: 'Task failed: Channel was disconnected while connection attempt was in progress;
        this may be caused by an abnormal Ansible exit, or due to an unreliable target.'
    unreachable: true
fatal: [guenther54]: UNREACHABLE! => 
    changed: false
    msg: 'Task failed: Channel was disconnected while connection attempt was in progress;
        this may be caused by an abnormal Ansible exit, or due to an unreliable target.'
    unreachable: true
[ERROR]: Task failed: Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected

Task failed.

<<< caused by >>>

Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected

fatal: [guenther51]: UNREACHABLE! => 
    changed: false
    msg: 'Task failed: Mitogen was disconnected from the remote environment while a call
        was in-progress. If you feel this is in error, please file a bug. Original error
        was: the respondent Context has disconnected'
    unreachable: true
fatal: [guenther52]: UNREACHABLE! => 
    changed: false
    msg: 'Task failed: Mitogen was disconnected from the remote environment while a call
        was in-progress. If you feel this is in error, please file a bug. Original error
        was: the respondent Context has disconnected'
    unreachable: true
fatal: [guenther49]: UNREACHABLE! => 
    changed: false
    msg: 'Task failed: Mitogen was disconnected from the remote environment while a call
        was in-progress. If you feel this is in error, please file a bug. Original error
        was: the respondent Context has disconnected'
    unreachable: true
fatal: [guenther53]: UNREACHABLE! => 
    changed: false
    msg: 'Task failed: Channel was disconnected while connection attempt was in progress;
        this may be caused by an abnormal Ansible exit, or due to an unreliable target.'
    unreachable: true
fatal: [guenther50]: UNREACHABLE! => 
    changed: false
    msg: 'Task failed: Mitogen was disconnected from the remote environment while a call
        was in-progress. If you feel this is in error, please file a bug. Original error
        was: the respondent Context has disconnected'
    unreachable: true
[ERROR]: Task failed: EOF on stream; last 100 lines received:
MITO000

fatal: [guenther56]: UNREACHABLE! => 
    changed: false
    msg: |-
        Task failed: EOF on stream; last 100 lines received:
        MITO000
    unreachable: true
ok: [guenther58]
ok: [guenther59]
^C
^C[ERROR]: User interrupted execution
[ERROR]: [mux  2211812] 09:52:27.265308 E mitogen.service: Pool(92d0, size=32, th='mitogen.Pool.92d0.1'): while invoking 'propagate_paths_and_modules' of 'mitogen.service.PushFileService'
Traceback (most recent call last):
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 619, in _on_service_call
    return invoker.invoke(method_name, kwargs, msg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 305, in invoke
    response = self._invoke(method_name, kwargs, msg)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 291, in _invoke
    ret = method(**kwargs)
          ^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 758, in propagate_paths_and_modules
    self.propagate_to(context, mitogen.core.to_text(path), overridden_source)
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 793, in propagate_to
    self._forward(context, path)
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 712, in _forward
    child = self.router.context_by_id(stream.protocol.remote_id)
                                      ^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'protocol'
^C[ERROR]: Unexpected Exception, this is probably a bug: process object is closed

Unexpected Exception, this is probably a bug.

<<< caused by >>>

process object is closed

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 386, in run
    play_return = strategy.run(iterator, play_context)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 348, in run
    return mitogen.core._profile_hook('Strategy',
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 676, in _profile_hook
    return func(*args)
           ^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 349, in <lambda>
    lambda: run(iterator, play_context)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/linear.py", line 195, in run
    self._queue_task(host, task, task_vars, play_context)
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 319, in _queue_task
    return super(StrategyMixin, self)._queue_task(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/__init__.py", line 376, in _queue_task
    worker_prc = WorkerProcess(
                 ^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/process/worker.py", line 86, in __init__
    super(WorkerProcess, self).__init__()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 87, in __init__
    self._parent_name = _current_process.name
                        ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 189, in name
    @property
    
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
    if worker is None or not worker.is_alive():
                             ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
    self._check_closed()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
    raise ValueError("process object is closed")
ValueError: process object is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 188, in run
    result = self._tqm.run(play=play)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 389, in run
    self._cleanup_processes()
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
    if worker_prc and worker_prc.is_alive():
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
    self._check_closed()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
    raise ValueError("process object is closed")
ValueError: process object is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 660, in cli_executor
    exit_code = cli.run()
                ^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/playbook.py", line 153, in run
    results = pbex.run()
              ^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 252, in run
    self._tqm.cleanup()
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 404, in cleanup
    self._cleanup_processes()
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
    if worker_prc and worker_prc.is_alive():
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
    self._check_closed()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
    raise ValueError("process object is closed")
ValueError: process object is closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 669, in cli_executor
    raise AnsibleError("Unexpected Exception, this is probably a bug.") from ex
ansible.errors.AnsibleError: Unexpected Exception, this is probably a bug: process object is closed
^CException ignored in: <function _releaseLock at 0x7f3aa8c75ee0>
Traceback (most recent call last):
  File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
    def _releaseLock():
    
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
    raise KeyboardInterrupt()
KeyboardInterrupt: 
skipping: [min]
skipping: [mohiam]
^C
^C[ERROR]: User interrupted execution
Process WorkerProcess-2202:
Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 175, in wrap_worker__run
    ansible_mitogen.affinity.policy.assign_worker()
  File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 251, in assign_worker
    self._balance('WorkerProcess')
  File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 230, in _balance
    self._set_cpu(descr, self._reserve_shift + (
  File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 235, in _set_cpu
    self._set_affinity(descr, 1 << (cpu % self.cpu_count))
  File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 220, in _set_affinity
    self._set_cpu_mask(mask)
  File "/home/user/.ansible/mitogen/ansible_mitogen/affinity.py", line 281, in _set_cpu_mask
    _sched_setaffinity(tid, len(s), s)
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
    if worker is None or not worker.is_alive():
                             ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
^C[ERROR]: User interrupted execution
[ERROR]: [mux  2257952] 10:05:09.236672 E mitogen.service: Pool(cb10, size=32, th='mitogen.Pool.cb10.19'): while invoking 'get' of 'ansible_mitogen.services.ContextService'
Traceback (most recent call last):
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 619, in _on_service_call
    return invoker.invoke(method_name, kwargs, msg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 307, in invoke
    msg.reply(response)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 962, in reply
    (self.router or router).route(msg)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 3549, in route
    self.broker.defer(self._async_route, msg)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 3029, in defer
    raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.

[ERROR]: [mux  2257952] 10:05:09.237781 E mitogen.service: While handling Message(0, 136349, 0, 110, 1000, b"\x80\x02X'\x00\x00\x00ansible_mitogen.services.ContextServiceq\x00X\x03"..696) using <bound method Pool._on_service_call of Pool(cb10, size=32, th='mitogen.Pool.cb10.19')>
Traceback (most recent call last):
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 619, in _on_service_call
    return invoker.invoke(method_name, kwargs, msg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 307, in invoke
    msg.reply(response)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 962, in reply
    (self.router or router).route(msg)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 3549, in route
    self.broker.defer(self._async_route, msg)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 3029, in defer
    raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 644, in _worker_run
    func(event)
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 628, in _on_service_call
    msg.reply(mitogen.core.CallError(e))
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 962, in reply
    (self.router or router).route(msg)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 3549, in route
    self.broker.defer(self._async_route, msg)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 3029, in defer
    raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.

[ERROR]: [mux  2257952] 10:05:09.238474 E mitogen.service: Pool(cb10, size=32, th='mitogen.Pool.cb10.6'): while invoking 'get' of 'ansible_mitogen.services.ContextService'
Traceback (most recent call last):
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 619, in _on_service_call
    return invoker.invoke(method_name, kwargs, msg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 307, in invoke
    msg.reply(response)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 962, in reply
    (self.router or router).route(msg)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 3549, in route
    self.broker.defer(self._async_route, msg)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 3029, in defer
    raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.

[ERROR]: [mux  2257952] 10:05:09.238786 E mitogen.service: While handling Message(0, 137351, 0, 110, 1000, b"\x80\x02X'\x00\x00\x00ansible_mitogen.services.ContextServiceq\x00X\x03"..696) using <bound method Pool._on_service_call of Pool(cb10, size=32, th='mitogen.Pool.cb10.6')>
Traceback (most recent call last):
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 619, in _on_service_call
    return invoker.invoke(method_name, kwargs, msg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 307, in invoke
    msg.reply(response)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 962, in reply
    (self.router or router).route(msg)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 3549, in route
    self.broker.defer(self._async_route, msg)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 3029, in defer
    raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 644, in _worker_run
    func(event)
  File "/home/user/.ansible/mitogen/mitogen/service.py", line 628, in _on_service_call
    msg.reply(mitogen.core.CallError(e))
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 962, in reply
    (self.router or router).route(msg)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 3549, in route
    self.broker.defer(self._async_route, msg)
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 3029, in defer
    raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.

Thank you for your great work on mitogen, it is very much appreciated 👏

rda0 avatar Nov 07 '25 10:11 rda0

I was not able to reproduce anymore busy looping (red) processes using 9cc0ab0, from #1349 cherry picked on top of most recent master fdb5c62.

I don't think you can use just 9cc0ab0, on top of fdb5c62. You wil also need 4e86cf448e38897195c8193f9ca17dd9e6774ab5, it adds support for stripping comments inside _first_stage(), and once you have both those commits you might as well use 9cc0ab0 (HEAD of that branch) directly.

moreati avatar Nov 07 '25 21:11 moreati

I don't think you can use just 9cc0ab0, on top of fdb5c62. You wil also need 4e86cf4, it adds support for stripping comments inside _first_stage(), and once you have both those commits you might as well use 9cc0ab0 (HEAD of that branch) directly.

I tested again as you suggested with the same result: Leftover idle processes on the control node, the remote processes terminated after I killed the processes on my control node host.

Some errors encountered during Ctrl-C abort testing
^C[ERROR]: Unexpected Exception, this is probably a bug: process object is closed

Unexpected Exception, this is probably a bug.

<<< caused by >>>

process object is closed

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 386, in run
    play_return = strategy.run(iterator, play_context)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 348, in run
    return mitogen.core._profile_hook('Strategy',
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 676, in _profile_hook
    return func(*args)
           ^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 349, in <lambda>
    lambda: run(iterator, play_context)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/linear.py", line 195, in run
    self._queue_task(host, task, task_vars, play_context)
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 319, in _queue_task
    return super(StrategyMixin, self)._queue_task(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/__init__.py", line 376, in _queue_task
    worker_prc = WorkerProcess(
                 ^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/process/worker.py", line 101, in __init__
    self.worker_queue = WorkerQueue(ctx=multiprocessing_context)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/queues.py", line 48, in __init__
    self._wlock = ctx.Lock()
                  ^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/context.py", line 68, in Lock
    return Lock(ctx=self.get_context())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 162, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
  File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 57, in __init__
    sl = self._semlock = _multiprocessing.SemLock(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
    if worker is None or not worker.is_alive():
                             ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
    self._check_closed()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
    raise ValueError("process object is closed")
ValueError: process object is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 188, in run
    result = self._tqm.run(play=play)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 389, in run
    self._cleanup_processes()
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
    if worker_prc and worker_prc.is_alive():
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
    self._check_closed()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
    raise ValueError("process object is closed")
ValueError: process object is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 660, in cli_executor
    exit_code = cli.run()
                ^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/playbook.py", line 153, in run
    results = pbex.run()
              ^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 252, in run
    self._tqm.cleanup()
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 404, in cleanup
    self._cleanup_processes()
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
    if worker_prc and worker_prc.is_alive():
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
    self._check_closed()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
    raise ValueError("process object is closed")
ValueError: process object is closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 669, in cli_executor
    raise AnsibleError("Unexpected Exception, this is probably a bug.") from ex
ansible.errors.AnsibleError: Unexpected Exception, this is probably a bug: process object is closed
^CException ignored in: <function _releaseLock at 0x7fd65bb79ee0>
Traceback (most recent call last):
  File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
    def _releaseLock():
    
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
    raise KeyboardIn



^CException ignored in: <function _releaseLock at 0x7fe529385ee0>
Traceback (most recent call last):
  File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
    def _releaseLock():
    
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
    raise KeyboardInterrupt()
KeyboardInterrupt: 
ok: [phd-test-aa2]
ok: [phd-test-san09]
[ERROR]: A worker was found in a dead state
^CException ignored in: <function _releaseLock at 0x7f315a921ee0>
Traceback (most recent call last):
  File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
    def _releaseLock():
    
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
    raise KeyboardInterrupt()
KeyboardInterrupt: 
[ERROR]: A worker was found in a dead state
[ERROR]: [mux  3622647] 10:56:31.056254 E mitogen: Broker(da90): pending work still existed 5 seconds after shutdown began. This may be due to a timer that is yet to expire, or a child connection that did not fully shut down.
^CProcess WorkerProcess-317:
[ERROR]: User interrupted execution
Traceback (most recent call last):
  File "/usr/lib/python3.11/weakref.py", line 214, in items
    v = wr()
        ^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
    if worker is None or not worker.is_alive():
                             ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 307, in _bootstrap
    self._after_fork()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 342, in _after_fork
    util._run_after_forkers()
  File "/usr/lib/python3.11/multiprocessing/util.py", line 163, in _run_after_forkers
    items = list(_afterfork_registry.items())
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/weakref.py", line 212, in items
    with _IterationGuard(self):
  File "/usr/lib/python3.11/_weakrefset.py", line 27, in __exit__
    def __exit__(self, e, t, b):
    
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
    if worker is None or not worker.is_alive():
                             ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
^CException ignored in: <function _releaseLock at 0x7f74e1385ee0>
Traceback (most recent call last):
  File "/usr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
    def _releaseLock():
    
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
    raise KeyboardInterrupt()
KeyboardInterrupt: 
^CException ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7fb708c2f2e0>
Traceback (most recent call last):
  File "/usr/lib/python3.11/weakref.py", line 105, in remove
    def remove(wr, selfref=ref(self), _atomic_removal=_remove_dead_weakref):

  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
    raise KeyboardInterrupt()
KeyboardInterrupt: 
[ERROR]: Task failed: Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected

Task failed.
Origin: /home/user/git/ansible-repo/roles-shared/network/tasks/systemd.yml:5:5

3   block:
4
5   - name: disable service NetworkManager
      ^ column 5

<<< caused by >>>

Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected

fatal: [phd-test-aa2]: UNREACHABLE! => 
    changed: false
    msg: 'Task failed: Mitogen was disconnected from the remote environment while a call
        was in-progress. If you feel this is in error, please file a bug. Original error
        was: the respondent Context has disconnected'
    unreachable: true
^C[ERROR]: Unexpected Exception, this is probably a bug: process object is closed

Unexpected Exception, this is probably a bug.

<<< caused by >>>

process object is closed

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 386, in run
    play_return = strategy.run(iterator, play_context)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 348, in run
    return mitogen.core._profile_hook('Strategy',
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/mitogen/core.py", line 676, in _profile_hook
    return func(*args)
           ^^^^^^^^^^^
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 349, in <lambda>
    lambda: run(iterator, play_context)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/linear.py", line 195, in run
    self._queue_task(host, task, task_vars, play_context)
  File "/home/user/.ansible/mitogen/ansible_mitogen/strategy.py", line 319, in _queue_task
    return super(StrategyMixin, self)._queue_task(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/plugins/strategy/__init__.py", line 376, in _queue_task
    worker_prc = WorkerProcess(
                 ^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/process/worker.py", line 101, in __init__
    self.worker_queue = WorkerQueue(ctx=multiprocessing_context)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/queues.py", line 43, in __init__
    self._rlock = ctx.Lock()
                  ^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/context.py", line 68, in Lock
    return Lock(ctx=self.get_context())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 162, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
  File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 58, in __init__
    kind, value, maxvalue, self._make_name(),
                           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 117, in _make_name
    next(SemLock._rand))
    ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/tempfile.py", line 293, in __next__
    return ''.join(self.rng.choices(self.characters, k=8))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/random.py", line 493, in choices
    return [population[floor(random() * n)] for i in _repeat(None, k)]
                                                     ^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 213, in _signal_handler
    if worker is None or not worker.is_alive():
                             ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
    self._check_closed()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
    raise ValueError("process object is closed")
ValueError: process object is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 188, in run
    result = self._tqm.run(play=play)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 389, in run
    self._cleanup_processes()
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
    if worker_prc and worker_prc.is_alive():
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
    self._check_closed()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
    raise ValueError("process object is closed")
ValueError: process object is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 660, in cli_executor
    exit_code = cli.run()
                ^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/playbook.py", line 153, in run
    results = pbex.run()
              ^^^^^^^^^^
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/playbook_executor.py", line 252, in run
    self._tqm.cleanup()
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 404, in cleanup
    self._cleanup_processes()
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 422, in _cleanup_processes
    if worker_prc and worker_prc.is_alive():
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 157, in is_alive
    self._check_closed()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 101, in _check_closed
    raise ValueError("process object is closed")
ValueError: process object is closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/cli/__init__.py", line 669, in cli_executor
    raise AnsibleError("Unexpected Exception, this is probably a bug.") from ex
^CException ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7f4199eef2e0>
Traceback (most recent call last):
  File "/usr/lib/python3.11/weakref.py", line 105, in remove
    def remove(wr, selfref=ref(self), _atomic_removal=_remove_dead_weakref):

  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
    raise KeyboardInterrupt()
KeyboardInterrupt: 
[ERROR]: Task failed: Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected

Task failed.
Origin: /home/user/git/ansible-repo/roles-shared/monitoring/tasks/xymon_client.yml:1:3

1 - name: install xymon-client dependencies
    ^ column 3

<<< caused by >>>

Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected
^CException ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7ff282b1b2e0>
Traceback (most recent call last):
  File "/usr/lib/python3.11/weakref.py", line 105, in remove
    def remove(wr, selfref=ref(self), _atomic_removal=_remove_dead_weakref):

  File "/home/user/env/pyenv/lib/python3.11/site-packages/ansible/executor/task_queue_manager.py", line 226, in _signal_handler
    raise KeyboardInterrupt()
KeyboardInterrupt: 
ok: [pf-pc20]
[ERROR]: A worker was found in a dead state
[ERROR]: User interrupted execution
[ERROR]: [mux  3735374] 11:07:57.514282 E mitogen.unix: listener: failed to assign identity to PID 3798448: [Errno 32] Broken pipe

rda0 avatar Nov 11 '25 11:11 rda0

@DanielRaapDev did you try the branch for #1349? (HEAD was 9cc0ab0, now 03a0a15)

moreati avatar Nov 27 '25 11:11 moreati

No, we removed Mitogen as we had already restructured our Ansible playbooks to run in separate pipeline jobs.

DanielRaapDev avatar Nov 27 '25 12:11 DanielRaapDev

But there still were some leftover idle processes on 19 remote hosts, which were all connected to processes on my control node host Leftover idle processes Output of ps -eo cputimes,stime,pid,ppid,user:12,args:

0 09:48 1478214 1478210 root         /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1478217 1478214 root         /usr/bin/python3(mitogen:user@host:2203330)
0 09:48 1478228 1478224 root         /usr/bin/python3(mitogen:user@host:2203330)
...

Contemplating why these (fork parent that execed python) processes are left running. I expected they would exit when the fork child was terminated by the unhandled signal hence the write side of both pipes would be closed, and stdin of the Python process would retun EOF.

moreati avatar Nov 27 '25 14:11 moreati

@moreati It's only a PoC, but this should at least handle the EOF and avoid the hanging process with 100%.

https://github.com/mhartmay/mitogen/commit/7ae83cd5d32c67ac8f06ffefcb40e1399eab5f93

@rda0 Can you please check whether my fix [1] works?

[1] https://github.com/mitogen-hq/mitogen/pull/1389

mhartmay avatar Dec 04 '25 16:12 mhartmay

@moreati It's only a PoC, but this should at least handle the EOF and avoid the hanging process with 100%.

mhartmay@7ae83cd

@rda0 Can you please check whether my fix [1] works?

[1] #1389

@mhartmay I am now using your fix since about a week and I have seen no more leftover processes 👍

rda0 avatar Dec 17 '25 10:12 rda0

@moreati It's only a PoC, but this should at least handle the EOF and avoid the hanging process with 100%. mhartmay@7ae83cd @rda0 Can you please check whether my fix [1] works? [1] #1389

@mhartmay I am now using your fix since about a week and I have seen no more leftover processes 👍

Thanks a ton for the feedback!

Hopefully my PR gets merged, but we still need to figure why the the EOF occurs before all the data has been read.

mhartmay avatar Dec 17 '25 16:12 mhartmay

Mitogen 0.3.36 includes Marc's fix for this.

moreati avatar Dec 18 '25 18:12 moreati