struct.error: 'q' format requires -9223372036854775808 <= number <= 9223372036854775807
👋 Hey,
I was experimenting with mitogen today, and ran into the following issues in one of our devices:
ERROR! [mux 38874] 12:56:57.828568 E mitogen.[ssh.lisbon-wago-pfc200.netbird.cloud]: ExternalContext.main() crashed
Traceback (most recent call last):
File "<stdin>", line 4184, in main
File "<stdin>", line 3921, in run
File "<stdin>", line 675, in _profile_hook
File "<stdin>", line 3904, in _dispatch_calls
File "<stdin>", line 1233, in __iter__
File "<stdin>", line 1219, in get
File "<stdin>", line 2831, in get
File "<stdin>", line 2795, in _make_cookie
struct.error: 'q' format requires -9223372036854775808 <= number <= 9223372036854775807
This happens when gathering facts on a normal playbook. I've tested this playbook with mitogen on different hosts and it ran normally.
- Do you have some idea of what the underlying problem may be?
I suspect it's something to do with this being a musl linked python interpreter, but I can't really confirm it. I've attached strace logs (following the instructions on the website) as well as ansible logs with the verbose flags.
The logs are running on the latest git commit of mitogen (e8005ece3ab39dacefdae81517610a9cd1ed6312) and the error seems to point to this line (https://github.com/mitogen-hq/mitogen/blob/e8005ece3ab39dacefdae81517610a9cd1ed6312/mitogen/core.py#L2795-L2796)
I haven't run into any incompatibilities when running ansible (without mitogen) on this host otherwise.
- Which version of Ansible are you running?
ansible [core 2.17.6]
- Is your version of Ansible patched in any way?
It's installed from homebrew but I don't think they carry any patches
- Are you running with any custom modules, or
module_utilsloaded?
No
- Have you tried the latest master version from Git?
Yes!
- Mention your host and target OS and versions
Host: MacOS Sonoma 14.6.1
Target: PTXDist 4.6.1 (Linux bad 5.15.107-rt62-w04.03.06 #1 PREEMPT_RT Wed Oct 23 17:06:52 UTC 2024 armv7l GNU/Linux)
- Mention your host and target Python versions
Host Python: Python 3.12.7 (Installed from homebrew) Target Python: Python 3.13.1 (Built against musl libc)
Attached Files:
ansible-verbose.txt ansible-config-changed.txt strace-python.12959.12963.txt strace-python.12959.12964.txt strace-python.12959.12986.txt
It looks like thread.get_ident() on this device returns a huge number...
root@bad:~ python3
Python 3.13.1 (main, Dec 10 2024, 03:04:53) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import _thread as thread
>>> thread.get_ident()
18446744072484806452
This may also be Python 3.13 / 3.13.1 specific. I installed Python 3.12.8 and I now get reasonable numbers out of get_ident
root@bad:~ python3
Python 3.12.8 (main, Dec 7 2024, 05:56:13) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import _thread as thread
>>> thread.get_ident()
3069599540
With this version of Python mitogen now runs without any issue.
Notes
{thread,_thread,threading}.get_ident()returns a non-zero integer intended for use within Python. It's not an OS thread id.{threading.get_native_id()(Python 3.8+) returns an OS assigned thread id. https://docs.python.org/3/library/threading.html#threading.get_native_idCOOKIE_FMT = '>Qqqq'is a u64, then 3 * i64. Q and q are the largest available formats as of Python 3.13.- the cookies are sent over the wire
- Nothing jumps out in https://docs.python.org/3/whatsnew/3.13.html
>>> thread.get_ident() 18446744072484806452
It's doesn't look like a boundary value, or a misinterpreted "special" value like -1
>>> hex(18446744072484806452)
'0xffffffffb6ffdf34'
>>> hex (2**64-1)
'0xffffffffffffffff'
Not seeing the larger value on Ubuntu 24.04 and Python 3.13.1 installed with uv 0.5.11 on aarch64. I don't know what the standalone Python's used by uv are linked against.
alex@ubuntu2404:~/src$ uv run --python 3.13 python
Python 3.13.1 (main, Dec 19 2024, 14:23:30) [GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import _thread; _thread.get_ident()
274109895942176
>>> import _thread; hex(_thread.get_ident())
'0xf94d2efa2020'
>>> hex(2**64-1)
'0xffffffffffffffff'
alex@ubuntu2404:~/src$ uname -a
Linux ubuntu2404 6.8.0-51-generic #52-Ubuntu SMP PREEMPT_DYNAMIC Thu Dec 5 13:32:09 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
DItto in Python 3.13.1 alpine container image
alex@ubuntu2404:~/src$ podman run -it python:3.13.1-alpine3.21
Resolved "python" as an alias (/etc/containers/registries.conf.d/shortnames.conf)
Trying to pull docker.io/library/python:3.13.1-alpine3.21...
...
Python 3.13.1 (main, Dec 10 2024, 00:50:27) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import _thread
>>> hex(_thread.get_ident())
'0xea2b50e5ad20'
>>> hex(2**64-1)
'0xffffffffffffffff'
I'm also seeing this on an oddball piece of hardware - it's a SPARC64 machine running Python 3.13.3, Debian unstable. SPARC64 is a big-endian architecture, which may be a contributing cause. The return value from thread.get_ident I'm working with was 18444492278191025856.
Playing around a bit suggests get_native_id might be a workable alternative to get_ident, though all ints in Python 3 are formally unbounded and can have any size. We probably need to handle that more generally with this cookie. I'll also note that the original report is on an armv7l processor which is an interesting data point.
Frankly, we could probably replace the whole thing with a SHA-256 hash of the existing inputs - the cookie is already 256 bits long and just needs to be stable given the same PID, thread ID, and object ID.