cypari2
cypari2 copied to clipboard
Make cypari2 threadsafe
As discussed here, the update from Sage 9.2 to 9.3 caused new segmentation faults when running with threads, which we believe is related to the cypari upgrade. Here's an illustration of the problem just with cypari.
Python 3.9.5 (default, May 4 2021, 03:33:11)
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cypari2
>>> pari = cypari2.Pari()
>>> pari.issquarefree(15)
1
>>> from concurrent.futures import ThreadPoolExecutor
>>> with ThreadPoolExecutor() as e:
... j = e.submit(pari.issquarefree, 15)
...
>>> j.result()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/[email protected]/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 438, in result
return self.__get_result()
File "/usr/local/Cellar/[email protected]/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
raise self._exception
File "/usr/local/Cellar/[email protected]/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "cypari2/auto_instance.pxi", line 14650, in cypari2.pari_instance.Pari_auto.issquarefree
File "cypari2/gen.pyx", line 4813, in cypari2.gen.objtogen
File "cypari2/convert.pyx", line 561, in cypari2.convert.PyObject_AsGEN
cysignals.signals.SignalError: Segmentation fault
Traceback with gdb
0x00007ffff708788e in new_chunk (x=<optimized out>) at ../src/kernel/none/level1.h:125
125 ../src/kernel/none/level1.h: Aucun fichier ou dossier de ce type.
(gdb) tb
Temporary breakpoint 1 at 0x7ffff708788e: file ../src/kernel/none/level1.h, line 125.
(gdb) bt
#0 0x00007ffff708788e in new_chunk (x=<optimized out>) at ../src/kernel/none/level1.h:125
#1 cgeti (x=<optimized out>) at ../src/kernel/none/level1.h:198
#2 __pyx_f_7cypari2_7convert_PyLong_AS_GEN (__pyx_v_x=__pyx_v_x@entry=0x7ffff7300af0)
at cypari2/convert.c:3721
#3 0x00007ffff7088858 in __pyx_f_7cypari2_7convert_PyObject_AsGEN (__pyx_v_x=<optimized out>)
at cypari2/convert.c:4151
#4 0x00007ffff5ff7fc6 in __pyx_f_7cypari2_3gen_objtogen (__pyx_v_s=0x7ffff7300af0,
__pyx_skip_dispatch=<optimized out>) at cypari2/gen.c:186911
#5 0x00007ffff6cd1b09 in __pyx_pf_7cypari2_13pari_instance_9Pari_auto_876issquarefree (
__pyx_v_x=0x7ffff7300af0, __pyx_v_self=<optimized out>) at cypari2/pari_instance.c:100713
#6 0x00007ffff6d2c859 in __Pyx_CyFunction_CallMethod (kw=0x0, arg=0x7ffff71c8f40,
self=0x7ffff5e2b540, func=0x7ffff5cc7a00) at cypari2/pari_instance.c:250940
#7 __Pyx_CyFunction_CallAsMethod (kw=0x0, args=<optimized out>, func=0x7ffff5cc7a00)
at cypari2/pari_instance.c:54373
#8 __Pyx_CyFunction_CallAsMethod (func=0x7ffff5cc7a00, args=<optimized out>, kw=0x0)
at cypari2/pari_instance.c:54357
I doubt https://trac.sagemath.org/ticket/31029 to be the source of problems (this was a minor upgrade). Did you test it with other cypari2 versions ?
Also, what is the paralellization enabled into your PARI/GP (the option --mt
of the Configure
script) ?
I ran cypari using sage -python
, and the contents of $SAGE_ROOT/local/lib/pari/pari.cfg
include MT_LIBS='-lpthread'
in the failing case, and MT_LIBS=''
in the working case. So that sounds plausible as an explanation. I don't know where this is getting set though.
As a test to see if it's #31029, I'll build 9.3.beta5 and 9.3.beta5+#31029 tonight and see if that makes a difference. If you can let me know how to change the MT
option to pari from the sage build system I'd be happy to try that as well.
Thanks!
@roed314, you can change it here: https://github.com/sagemath/sage/blob/develop/build/pkgs/pari/spkg-install.in#L130 This change happened on sage 9.3 beta8
I just checked and you were right. The problem is due to #30537 rather than #31029: there is no error with 9.3.beta7, but adding #30537 yields the problem described.
What resolution would you suggest? Changing the parallelization option for pari in Sage back to MT_LIBS=''
is possible of course. I don't know if there's a way to change cypari to allow the use of lpthread
.
I believe that some of the PARI functions interacting with the stack have to be used with care in a multithreaded PARI. I will ask Bill Allombert.
Could you copy/paste the gdb traceback (ie launch sage with sage -gdb
and after the crash run bt
)? On my archlinux version I got
0x00007fffec00413d in ?? ()
from /usr/lib/python3.9/site-packages/sage/libs/pari/convert_gmp.cpython-39-x86_64-linux-gnu.so
(gdb) bt
#0 0x00007fffec00413d in ?? ()
from /usr/lib/python3.9/site-packages/sage/libs/pari/convert_gmp.cpython-39-x86_64-linux-gnu.so
#1 0x00007fffee94afd8 in ?? ()
from /usr/lib/python3.9/site-packages/sage/rings/integer.cpython-39-x86_64-linux-gnu.so
#2 0x00007fffee94b2ce in ?? ()
from /usr/lib/python3.9/site-packages/sage/rings/integer.cpython-39-x86_64-linux-gnu.so
#3 0x00007fffec1c0877 in ?? ()
from /usr/lib/python3.9/site-packages/cypari2/gen.cpython-39-x86_64-linux-gnu.so
#4 0x00007fffec1e0c35 in ?? ()
from /usr/lib/python3.9/site-packages/cypari2/gen.cpython-39-x86_64-linux-gnu.so
#5 0x00007fffec5f5d89 in ?? ()
from /usr/lib/python3.9/site-packages/cypari2/pari_instance.cpython-39-x86_64-linux-gnu.so
#6 0x00007fffefd9fafb in ?? ()
from /usr/lib/python3.9/site-packages/zmq/backend/cython/error.cpython-39-x86_64-linux-gnu.so
which is weird since that involves sage integers...
Ha in sage 15 is an integer... should be 15r, traceback is different
#0 0x00007fffec04baea in ?? () from /usr/lib/python3.9/site-packages/cypari2/convert.cpython-39-x86_64-linux-gnu.so
#1 0x00007fffec04c1e0 in ?? () from /usr/lib/python3.9/site-packages/cypari2/convert.cpython-39-x86_64-linux-gnu.so
#2 0x00007fffec1e0dd1 in ?? () from /usr/lib/python3.9/site-packages/cypari2/gen.cpython-39-x86_64-linux-gnu.so
#3 0x00007fffec5f5d89 in ?? () from /usr/lib/python3.9/site-packages/cypari2/pari_instance.cpython-39-x86_64-linux-gnu.so
#4 0x00007fffefd9fafb in ?? () from /usr/lib/python3.9/site-packages/zmq/backend/cython/error.cpython-39-x86_64-linux-gnu.so
I am unsure about the mechanism underlying ThreadPoolExecutor
. On the PARI/GP side special care has to be taken with pari when using threads
-
pari_thread_alloc
/pari_thread_free
must be used to pass data topthread_create
-
pari_thread_start
/pari_thread_close
must be called in each children handling computations
Neither @edgarcosta or I has been able to get gdb working correctly, but it sounds like you don't need it anymore?
I think Pari's requirements for using threads is going to make higher level threading interfaces (like Python's ThreadPoolExecutor
) very difficult to use. Now that we've identified the issue, I think that from the LMFDB's perspective we're going to just turn off threading in flask, which solves the problem for us.
From Sage's perspective, I'm going to ask on #30537 about why they made that change to MT_LIBS
.
I'm not sure what you want to do with cypari. Let us know if there's anything else we can do to help.
@videlec here is my attempt to do what you asked us: https://gist.github.com/edgarcosta/55870bd5df1ac2f326931a2068a9c775
but I don't seem to manage to catch the traceback this way.
Neither @edgarcosta or I has been able to get gdb working correctly, but it sounds like you don't need it anymore?
I think I reproduced enough. Though what Edgar sent is very different from mine...
I think Pari's requirements for using threads is going to make higher level threading interfaces (like Python's
ThreadPoolExecutor
) very difficult to use. Now that we've identified the issue, I think that from the LMFDB's perspective we're going to just turn off threading in flask, which solves the problem for us.From Sage's perspective, I'm going to ask on #30537 about why they made that change to
MT_LIBS
.
Disabling --mt=pthreads
is only a partial solution. Multithreading ought to be supported. The only way I see for now would be to implement hooks (at C level) for ThreadPoolExecutor
. I will find some time to ask the question on the Python developer list. I will report it here.
I'm not sure what you want to do with cypari. Let us know if there's anything else we can do to help.
Thanks for the report! I am glad that you found a way around.
Flask looks to use the threading
Python package, which is also used by concurrent.futures
. These libraries seem to have hooks where you could insert the required pari function calls.
We're running into this again. We managed to disable threading most of the time, but threading is required to use the flask debugger, so we're not currently able to use the debugger in the LMFDB.
If anyone's interested in tackling this, we'd be grateful!
I would like to add to the requests by @roed314 and @edgarcosta . With Sage 9.5 due to be released soon it would be unfortunate for this problem to still be there. Thank you, cypari developers!
Just to echo @JohnCremona, we would be extremely grateful to anyone who can help with this, it is now impacting LMFDB development in a pretty serious way.
@JohnCremona @AndrewVSutherland cypari2 very welcome pull requests.
We totally understand that you (and other people with more cypari
knowledge) may not have time to work on this soon. We've been looking into what might be needed, but are still a bit uncertain about how to proceed. Based on Appendix B of the pari manual, it looks like we need to add Python-accessible mechanisms to cypari that can call the appropriate thread-functions in C (pari_thread_start
, pari_thread_close
, pari_thread_alloc
and pari_thread_free
). We'd then need to figure out the appropriate places in the CPython thread classes to call these, and possibly add some convenience classes to cypari for using threads.
Neither @edgarcosta or I have worked much with the PARI stack, and we would appreciate any tips you can give us about where to begin. If the best option is to email the pari developer list and ask there, we'd be happy to do that.
Usage examples of multithread in PARI/GP are available in examples/thread.c
and examples/openmp.c
.
As Bill mentioned to me, one has to be careful : in a multithreaded compiled PARI, some functions are likely to spawn thread. When using ThreadPoolExecutor
we typically want to deactivate multithreading inside PARI.
Thanks to Bill, I have a working prototype. I hope to push a branch soon.
@JohnCremona @AndrewVSutherland @roed314 @edgarcosta Could you test the branch at #116 to see if you can break it ?
Absolutely! I'll take a look.
In my work I have switched from threading (https://docs.python.org/3/library/threading.html) to multiprocessing (https://docs.python.org/3/library/multiprocessing.html) and no PARI segfaults. Of course this is not the same thing but it could help someone that comes across this due to having segfaults while parallelising in Sage.