become fails on solaris
When using with recent SmartOS (Solaris/Illumos), mitogen works as expected (and super fast) as long as I don't use become. If I use become, this error occurs in tcgetattr on the master connection; if I avoid calling that for the master, that problem goes away (obviously), but mitogen times out waiting for MITO000 prompt.
I'm relatively new to mitogen, very experienced in Ansible and Python (I've submitted upstream patches to ansible and cpython) and can provide a test environment if necessary. If you can point me to some diagnostics or other thoughts, I'm interested in getting this to work. It's seemingly so close, but I'm not sure of the differences between this working without sudo and not working with sudo.
-
Which version of Ansible are you running? 2.12.5
-
Is your version of Ansible patched in any way? no
-
Are you running with any custom modules, or
module_utilsloaded? no -
Have you tried the latest master version from Git? yes
-
Do you have some idea of what the underlying problem may be? https://mitogen.networkgenomics.com/ansible_detailed.html#common-problems has instructions to help figure out the likely cause and how to gather relevant logs.
looks to be a difference in pty handling after openpty. The first issue was a call (in disable_echo) to: old = termios.tcgetattr(fd) which for the fd in question (master) raises termios.error: (22, 'Invalid argument')
If I put a try:except: around the call to disable_echo for master, the command times out waiting for the MITO000 prompt.
master_fp = os.fdopen(master_fd, 'r+b', 0)
slave_fp = os.fdopen(slave_fd, 'r+b', 0)
try:
disable_echo(master_fd)
except:
pass
disable_echo(slave_fd)
-
Mention your host and target OS and versions SmartOS 21.4 (most recent patches) on both sides.
-
Mention your host and target Python versions Python 3.9.13 on both
-
If reporting a crash or hang in Ansible, please rerun with -vvv and include 200 lines of output around the point of the error, along with a full copy of any traceback or error text in the log. Beware "-vvv" may include secret data! Edit as necessary before posting.
Attached as pre_patch and post_patch (with the try:except around disable_echo(fd_master))
- If reporting any kind of problem with Ansible, please include the Ansible version along with output of "ansible-config dump --only-changed".
DEFAULT_ROLES_PATH(/root/ansible-web/ansible.cfg) = ['/root/ansible']
DEFAULT_STRATEGY(/root/ansible-web/ansible.cfg) = mitogen_linear
DEFAULT_STRATEGY_PLUGIN_PATH(/root/ansible-web/ansible.cfg) = ['/root/mitogen/ansible_mitogen/plugins/strategy']
DEFAULT_VAULT_PASSWORD_FILE(env: ANSIBLE_VAULT_PASSWORD_FILE) = /root/.vault_password
HOST_KEY_CHECKING(/root/ansible-web/ansible.cfg) = False
NETCONF_SSH_CONFIG(env: ANSIBLE_NETCONF_SSH_CONFIG) = True
I made a little more progress investigating this. If I detect solaris and don't run
fcntl.ioctl(2, termios.TIOCSCTTY)
in _acquire_controlling_tty then that seems to solve the pty problem (in concert with the previous patch). However, now I'm running into permissions issues during what appears to be the command bootstrap:
In this case, the original SSH is as root and the sudo is to a lower-privileged account so that a subset of commands can run as the processes that will be executing them once the system goes live.
Continuing on the exploration here. The command that I was running was django_manage and if I run that command using:
ansible -i inventory host -m django_manage -a 'arguments' --become-user=www -b -vvv
I don't receive the error from after_ioctl_patch.txt above.
However, running an ansible-playbook with nothing but a command to echo a string results in the same error as in after_ioctl_patch.txt; it looks like it might be related to having had setup run, so I tried the same script with gather_facts: no and that didn't throw the error, either.
The same scripts, unmodified, run fine without mitogen, so there's an element here, and I would imagine this works just fine with non-Solaris OS. Based on some other similar error messages, I'm expecting this has something to do with the execution environment of SmartOS when run for the first time using sudo under mitogen. Since it doesn't fail with the simple case of running the command ad hoc without running setup, I suspect this is just related to the setup (gather_facts) module running in the sudo environment when the privileges are being reduced.
Given the complexities of the caching and execution environments, I'm open to suggestions on where to focus next. Any pointers appreciated!