gmpy icon indicating copy to clipboard operation
gmpy copied to clipboard

possible issue (sigsegv) with cached_context

Open tjb900 opened this issue 3 years ago • 5 comments

Hi!

I've been tracking down a very rare crash in our application, and I think I've reached a point where I can see a problem in gmpy2 - would very much appreciate if you could either tell me I'm dreaming or not!

I believe we are crashing at this line:

/* Return borrowed reference to thread local context. */
static CTXT_Object *
GMPy_current_context(void)
{
    PyThreadState *tstate = PyThreadState_GET();

    if (cached_context && cached_context->tstate == tstate) {     <=== second part of this is a segfault, dereferencing cached_context
        return (CTXT_Object*)cached_context;
    }

    return current_context_from_dict();
}

gdb dump showing instruction where fault occurred

Dump of assembler code for function GMPy_current_context:
   0x00007f6b928eaaf0 <+0>:     push   %rbp
   0x00007f6b928eaaf1 <+1>:     push   %rbx
   0x00007f6b928eaaf2 <+2>:     sub    $0x8,%rsp

# I suspect this is the call to PyThreadState_GET
   0x00007f6b928eaaf6 <+6>:     callq  *0x70dec(%rip)        # 0x7f6b9295b8e8
   0x00007f6b928eaafc <+12>:    mov    0x77305(%rip),%rbx        # 0x7f6b92961e08 <cached_context>

# I'm fairly sure this the `if cached_context`
   0x00007f6b928eab03 <+19>:    test   %rbx,%rbx
   0x00007f6b928eab06 <+22>:    je     0x7f6b928eab0e <GMPy_current_context+30>

# and then this is the `if cached_context->tstate == tstate` - and that's where the crash is
=> 0x00007f6b928eab08 <+24>:    cmp    %rax,0x78(%rbx)
   0x00007f6b928eab0c <+28>:    je     0x7f6b928eab4e <GMPy_current_context+94>
   0x00007f6b928eab0e <+30>:    callq  *0x7087c(%rip)        # 0x7f6b9295b390
   0x00007f6b928eab14 <+36>:    mov    %rax,%rbp
   0x00007f6b928eab17 <+39>:    test   %rax,%rax

Is it possible that the cached_context could point to a thread-local state for a thread that has since exited?

Thanks and Kind Regards, Tim


This is the traceback from the crash - gmpy2 is being used via sympy:

#8  <signal handler called>
#9  0x00007f6b928eab08 in GMPy_current_context () from ..../site-packages/gmpy2/gmpy2.cpython-38-x86_64-linux-gnu.so
#10 0x00007f6b929124ac in GMPy_RichCompare_Slot () from ..../site-packages/gmpy2/gmpy2.cpython-38-x86_64-linux-gnu.so
#11 0x000055fceaa200eb in do_richcompare (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:726
#12 PyObject_RichCompare (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:774
#13 PyObject_RichCompareBool (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:796
#14 PyObject_RichCompareBool (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:782
#15 tuplerichcompare (v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2), w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), op=2) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/tupleobject.c:655
#16 0x000055fceaa1ea60 in do_richcompare (op=2, w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2)) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:726
#17 PyObject_RichCompare (v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2), w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), op=2) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:774
#18 0x000055fceaac25cb in cmp_outcome (w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2), op=<optimized out>, tstate=<optimized out>)
    at /home/sat_bot/base/conda-bld/python_1648081724180/work/Python/ceval.c:5111

py-bt:

  File "..../site-packages/sympy/core/numbers.py", line 668, in _eval_evalf
    return Float._new(self._as_mpf_val(prec), prec)
  (frame information optimized out)
  File "..../site-packages/sympy/core/expr.py", line 912, in _eval_is_extended_negative
    return self._eval_is_extended_positive_negative(positive=False)
  File "..../site-packages/sympy/core/assumptions.py", line 501, in _ask
    a = evaluate(obj)
  File "..../site-packages/sympy/core/assumptions.py", line 513, in _ask
    _ask(pk, obj)

tjb900 avatar Jul 27 '22 05:07 tjb900

IIRC, there have been some changes in handling thread_state in recent versions (3.9 or 3.10 ??).

What version of Python are you using?

Is it reproducible enough to justify the effort in testing with an older version, say 3.7?

casevh avatar Jul 28 '22 05:07 casevh

This is with 3.8 - I haven't had time to work on a reproducer yet but that's my next step, certainly.

tjb900 avatar Jul 28 '22 06:07 tjb900

I found a simple way to reproduce the issue without involving any other libraries. It is a reference counting bug. Don't know where it is yet but I can trigger it within a few seconds. Interestingly, I can trigger with Python 3.7 but not any later versions.

casevh avatar Sep 11 '22 05:09 casevh

TL;DR I'm fairly certain I've solved the issue.

We've recently made some changes and are starting to work on the next major release - version 2.2. The minimum supported version of Python is now 3.7. Contextvars were introduced in Python 3.7 as a replacement for using thread local storage to manage application contexts (such as gmpy2). I converted to using contextvars and my intermittent crashes have stopped.

Can you test your application compiling from the latest source?

Case

casevh avatar Dec 24 '22 03:12 casevh

@tjb900, now there are binary wheels for 2.2.0a1. Can you reproduce the issue?

skirpichev avatar Oct 04 '23 08:10 skirpichev