gmpy
gmpy copied to clipboard
possible issue (sigsegv) with cached_context
Hi!
I've been tracking down a very rare crash in our application, and I think I've reached a point where I can see a problem in gmpy2 - would very much appreciate if you could either tell me I'm dreaming or not!
I believe we are crashing at this line:
/* Return borrowed reference to thread local context. */
static CTXT_Object *
GMPy_current_context(void)
{
PyThreadState *tstate = PyThreadState_GET();
if (cached_context && cached_context->tstate == tstate) { <=== second part of this is a segfault, dereferencing cached_context
return (CTXT_Object*)cached_context;
}
return current_context_from_dict();
}
gdb dump showing instruction where fault occurred
Dump of assembler code for function GMPy_current_context:
0x00007f6b928eaaf0 <+0>: push %rbp
0x00007f6b928eaaf1 <+1>: push %rbx
0x00007f6b928eaaf2 <+2>: sub $0x8,%rsp
# I suspect this is the call to PyThreadState_GET
0x00007f6b928eaaf6 <+6>: callq *0x70dec(%rip) # 0x7f6b9295b8e8
0x00007f6b928eaafc <+12>: mov 0x77305(%rip),%rbx # 0x7f6b92961e08 <cached_context>
# I'm fairly sure this the `if cached_context`
0x00007f6b928eab03 <+19>: test %rbx,%rbx
0x00007f6b928eab06 <+22>: je 0x7f6b928eab0e <GMPy_current_context+30>
# and then this is the `if cached_context->tstate == tstate` - and that's where the crash is
=> 0x00007f6b928eab08 <+24>: cmp %rax,0x78(%rbx)
0x00007f6b928eab0c <+28>: je 0x7f6b928eab4e <GMPy_current_context+94>
0x00007f6b928eab0e <+30>: callq *0x7087c(%rip) # 0x7f6b9295b390
0x00007f6b928eab14 <+36>: mov %rax,%rbp
0x00007f6b928eab17 <+39>: test %rax,%rax
Is it possible that the cached_context could point to a thread-local state for a thread that has since exited?
Thanks and Kind Regards, Tim
This is the traceback from the crash - gmpy2 is being used via sympy:
#8 <signal handler called>
#9 0x00007f6b928eab08 in GMPy_current_context () from ..../site-packages/gmpy2/gmpy2.cpython-38-x86_64-linux-gnu.so
#10 0x00007f6b929124ac in GMPy_RichCompare_Slot () from ..../site-packages/gmpy2/gmpy2.cpython-38-x86_64-linux-gnu.so
#11 0x000055fceaa200eb in do_richcompare (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:726
#12 PyObject_RichCompare (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:774
#13 PyObject_RichCompareBool (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:796
#14 PyObject_RichCompareBool (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:782
#15 tuplerichcompare (v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2), w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), op=2) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/tupleobject.c:655
#16 0x000055fceaa1ea60 in do_richcompare (op=2, w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2)) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:726
#17 PyObject_RichCompare (v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2), w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), op=2) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:774
#18 0x000055fceaac25cb in cmp_outcome (w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2), op=<optimized out>, tstate=<optimized out>)
at /home/sat_bot/base/conda-bld/python_1648081724180/work/Python/ceval.c:5111
py-bt:
File "..../site-packages/sympy/core/numbers.py", line 668, in _eval_evalf
return Float._new(self._as_mpf_val(prec), prec)
(frame information optimized out)
File "..../site-packages/sympy/core/expr.py", line 912, in _eval_is_extended_negative
return self._eval_is_extended_positive_negative(positive=False)
File "..../site-packages/sympy/core/assumptions.py", line 501, in _ask
a = evaluate(obj)
File "..../site-packages/sympy/core/assumptions.py", line 513, in _ask
_ask(pk, obj)
IIRC, there have been some changes in handling thread_state in recent versions (3.9 or 3.10 ??).
What version of Python are you using?
Is it reproducible enough to justify the effort in testing with an older version, say 3.7?
This is with 3.8 - I haven't had time to work on a reproducer yet but that's my next step, certainly.
I found a simple way to reproduce the issue without involving any other libraries. It is a reference counting bug. Don't know where it is yet but I can trigger it within a few seconds. Interestingly, I can trigger with Python 3.7 but not any later versions.
TL;DR I'm fairly certain I've solved the issue.
We've recently made some changes and are starting to work on the next major release - version 2.2. The minimum supported version of Python is now 3.7. Contextvars were introduced in Python 3.7 as a replacement for using thread local storage to manage application contexts (such as gmpy2). I converted to using contextvars and my intermittent crashes have stopped.
Can you test your application compiling from the latest source?
Case
@tjb900, now there are binary wheels for 2.2.0a1. Can you reproduce the issue?