Segfault in multi-threaded script
Hello,
I am working with EasySNMP for a while now, and created a script using it. It worked fine but the SNMP queries were slow thus I started threading it as the same code is executed 3 times in a row. The situation now is that threads are automatically created execute the same code including SNMP queries and the main thread wait for each one to stop to proceed the end of the code. While executing it, sometimes everything works perfectly and other times I get a segfault. I have launched it using GDB to get more informations about it.
Here are the logs :
$0 0x00000000005e9d59 in PyUnicodeUCS4_FromFormatV () $1 0x00007ffff5ca652b in py_log_msg (log_level=3, printf_fmt=
) at easysnmp/interface.c:3397 $2 0x00007ffff5ca9c25 in netsnmp_walk (self=0x0, args=0x0) at easysnmp/interface.c:2786 $3 0x00000000004c9e05 in PyEval_EvalFrameEx () $4 0x00000000004c87a1 in PyEval_EvalCodeEx () $5 0x00000000004ca31a in PyEval_EvalFrameEx () $6 0x00000000004ca592 in PyEval_EvalFrameEx () $7 0x00000000004e5fe8 in ?? () $8 0x00000000004cc36b in PyEval_EvalFrameEx () $9 0x00000000004ca592 in PyEval_EvalFrameEx () $10 0x00000000004ca592 in PyEval_EvalFrameEx () $11 0x00000000004e5fe8 in ?? () $12 0x00000000005045d8 in ?? () $13 0x00000000004d1a1b in PyEval_CallObjectWithKeywords () $14 0x00000000005bc102 in ?? () $15 0x00007ffff7bc70a4 in start_thread (arg=0x7fffeef74700) at pthread_create.c:309 $16 0x00007ffff6fd904d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
I have tried to keep the same code and making it run with only one thread creation. In this very case I don't get any segfault. Thus the problem really is when there are two new threads running at the same time.
I know EasySNMP is not said to be thread safe but if you could help me find a solution it would be very nice.
Thanks guys !
Hi guys,
I come back on this post, because I didn't find any solution but launching it in non-threaded mode. Launching my script in multi-thread mode would be very useful. Has anyone ever succeeded using threads with easysnmp ?
Here are new logs I get when the segfault occurs, I hope someone can help. (The problem may be in net-snmp and not directly easysnmp but I ain't sure).
févr. 12 11:30:01 Archlinux kernel: idirect-nms-snm[24090]: segfault at 0 ip 00007fbecd5b80ca sp 00007fbebbffbd18 error 4 in libc-2.22.so[7fbecd537000+19b000]
févr. 12 11:30:01 Archlinux systemd[1]: idirect-nms-snmp-infos-to-db.service: Control process exited, code=killed status=11
févr. 12 11:30:01 Archlinux systemd[1]: Failed to start the script collecting iDirect NMS SNMP infos and filling the web server database.
févr. 12 11:30:01 Archlinux systemd[1]: idirect-nms-snmp-infos-to-db.service: Unit entered failed state.
févr. 12 11:30:01 Archlinux systemd[1]: idirect-nms-snmp-infos-to-db.service: Failed with result 'signal'.
févr. 12 11:30:01 Archlinux systemd-coredump[24107]: Process 24065 (idirect-nms-snm) of user 0 dumped core.
Stack trace of thread 24090:
#0 0x00007fbecd5b80ca strlen (libc.so.6)
#1 0x00007fbecdbb5be9 PyUnicodeUCS4_FromFormatV (libpython2.7.so.1.0)
#2 0x00007fbec96e654b py_log_msg (interface.so)
#3 0x00007fbec96ea64d netsnmp_walk (interface.so)
#4 0x00007fbecdbe82be PyEval_EvalFrameEx (libpython2.7.so.1.0)
#5 0x00007fbecdbeb393 PyEval_EvalCodeEx (libpython2.7.so.1.0)
#6 0x00007fbecdbe7de3 PyEval_EvalFrameEx (libpython2.7.so.1.0)
#7 0x00007fbecdbe7fbd PyEval_EvalFrameEx (libpython2.7.so.1.0)
#8 0x00007fbecdbeb393 PyEval_EvalCodeEx (libpython2.7.so.1.0)
#9 0x00007fbecdb6c7d0 function_call (libpython2.7.so.1.0)
#10 0x00007fbecdb44fe7 PyObject_Call (libpython2.7.so.1.0)
#11 0x00007fbecdbe51fe PyEval_EvalFrameEx (libpython2.7.so.1.0)
#12 0x00007fbecdbe7fbd PyEval_EvalFrameEx (libpython2.7.so.1.0)
#13 0x00007fbecdbe7fbd PyEval_EvalFrameEx (libpython2.7.so.1.0)
#14 0x00007fbecdbeb393 PyEval_EvalCodeEx (libpython2.7.so.1.0)
#15 0x00007fbecdb6c6f8 function_call (libpython2.7.so.1.0)
#16 0x00007fbecdb44fe7 PyObject_Call (libpython2.7.so.1.0)
#17 0x00007fbecdb5536f instancemethod_call (libpython2.7.so.1.0)
#18 0x00007fbecdb44fe7 PyObject_Call (libpython2.7.so.1.0)
#19 0x00007fbecdbe061b PyEval_CallObjectWithKeywords (libpython2.7.so.1.0)
#20 0x00007fbecdc1db26 t_bootstrap (libpython2.7.so.1.0)
#21 0x00007fbecd8e24a4 start_thread (libpthread.so.0)
#22 0x00007fbecd62013d __clone (libc.so.6)
Stack trace of thread 24065:
#0 0x00007fbecd8ea2d7 do_futex_wait.constprop.1 (libpthread.so.0)
#1 0x00007fbecd8ea384 __new_sem_wait_slow.constprop.0 (libpthread.so.0)
#2 0x00007fbecdc1945d PyThread_acquire_lock (libpython2.7.so.1.0)
#3 0x00007fbecdc1d682 lock_PyThread_acquire_lock (libpython2.7.so.1.0)
#4 0x00007fbecdbe82be PyEval_EvalFrameEx (libpython2.7.so.1.0)
#5 0x00007fbecdbeb393 PyEval_EvalCodeEx (libpython2.7.so.1.0)
#6 0x00007fbecdbe7de3 PyEval_EvalFrameEx (libpython2.7.so.1.0)
#7 0x00007fbecdbeb393 PyEval_EvalCodeEx (libpython2.7.so.1.0)
#8 0x00007fbecdbe7de3 PyEval_EvalFrameEx (libpython2.7.so.1.0)
#9 0x00007fbecdbe7fbd PyEval_EvalFrameEx (libpython2.7.so.1.0)
#10 0x00007fbecdbeb393 PyEval_EvalCodeEx (libpython2.7.so.1.0)
#11 0x00007fbecdbeb4cc PyEval_EvalCode (libpython2.7.so.1.0)
#12 0x00007fbecdc064f3 run_mod (libpython2.7.so.1.0)
#13 0x00007fbecdc077a2 PyRun_FileExFlags (libpython2.7.so.1.0)
#14 0x00007fbecdc08ad9 PyRun_SimpleFileExFlags (libpython2.7.so.1.0)
#15 0x00007fbecdc1bb05 Py_Main (libpython2.7.so.1.0)
#16 0x00007fbecd557610 __libc_start_main (libc.so.6)
#17 0x0000555cf3446859 _start (python2.7)
Stack trace of thread 24086:
#0 0x00007fbecd618e23 __select (libc.so.6)
#1 0x00007fbec9419ebf snmp_sess_synch_response (libnetsnmp.so.30)
#2 0x00007fbec96e66ff __send_sync_pdu (interface.so)
#3 0x00007fbec96ea3f1 netsnmp_walk (interface.so)
#4 0x00007fbecdbe82be PyEval_EvalFrameEx (libpython2.7.so.1.0)
#5 0x00007fbecdbeb393 PyEval_EvalCodeEx (libpython2.7.so.1.0)
#6 0x00007fbecdbe7de3 PyEval_EvalFrameEx (libpython2.7.so.1.0)
#7 0x00007fbecdbe7fbd PyEval_EvalFrameEx (libpython2.7.so.1.0)
#8 0x00007fbecdbeb393 PyEval_EvalCodeEx (libpython2.7.so.1.0)
#9 0x00007fbecdb6c7d0 function_call (libpython2.7.so.1.0)
#10 0x00007fbecdb44fe7 PyObject_Call (libpython2.7.so.1.0)
#11 0x00007fbecdbe51fe PyEval_EvalFrameEx (libpython2.7.so.1.0)
#12 0x00007fbecdbe7fbd PyEval_EvalFrameEx (libpython2.7.so.1.0)
#13 0x00007fbecdbe7fbd PyEval_EvalFrameEx (libpython2.7.so.1.0)
#14 0x00007fbecdbeb393 PyEval_EvalCodeEx (libpython2.7.so.1.0)
#15 0x00007fbecdb6c6f8 function_call (libpython2.7.so.1.0)
#16 0x00007fbecdb44fe7 PyObject_Call (libpython2.7.so.1.0)
#17 0x00007fbecdb5536f instancemethod_call (libpython2.7.so.1.0)
#18 0x00007fbecdb44fe7 PyObject_Call (libpython2.7.so.1.0)
#19 0x00007fbecdbe061b PyEval_CallObjectWithKeywords (libpython2.7.so.1.0)
#20 0x00007fbecdc1db26 t_bootstrap (libpython2.7.so.1.0)
#21 0x00007fbecd8e24a4 start_thread (libpthread.so.0)
#22 0x00007fbecd62013d __clone (libc.so.6)
Thank you.
I know the project is currently in bad shape as new maintainers are seeked, I hope everything will get back in order.
I've never tried threading with easysnmp but have had success with multiprocessing. Could you try using that instead?
Thank you for the suggestion but it is not possible to pass from threading to multiprocessing without changing a lot of code because the memory needs to be shared...
Do you understand anything from the logs ?
Unfortunately I do not. Personally I have little experience with threading and I do not know how the library was written. I see that threads 24090 and 24086 are called simultaneously (stack step 9) like you said from your first comment and they call netsnmp_walk at stack step 3. Thread 24086 seems to have already sent PDUs and is currently trying to sync the responses. Thread 24090 instead calls py_log_message from interface.so. Is the thread throwing an error because it's trying to lock an interface that hasn't been released yet (and hasn't been told to wait until available)?
By interface do you mean network interface ? In my code I do not have any lock. The threading occurs only to fetch data using SNMP, fill a dictionary and wait for other threads to stop. If I am right the dictionary structure is thread-safe in Python thus my code seams correct on this point. The only problem that could possibly occurs may be if in a way easysnmp or net-snmp do not manage well threads, right ?
The complexity likely comes from the C interface which is likely not thread safe. In Python, I would usually suggest using multiprocessing anyway to be honest, the libraries work very similarly but multiprocessing is much safer as individual processes with their own memory are spawned.
I'll leave this open for tracking though in case someone wants to solve it in the future.
+1
I am too facing this problem.
I have 2 threads running, 1 periodically walks interface stuff like index, interface name, description, etc. and other thread uses this information to get values like bandwidth, interface errors, etc.
I too am in a problem where I cannot run this in multiprocess mode as you can see, I have to share the data between the threads.
Can someone fix this asap or I will have to switch to some other module.
You just need to add locking to your code @mail3dexter @LEALCorentin I can easily reproduce this issue and easily fix the issue with locking without slowing the code down all that much.
From the Core Dump here is where it is crashing:
(gdb) py-bt
Traceback (most recent call first):
<built-in method walk of module object at remote 0x7fa2074a6278>
File "/usr/local/lib/python3.6/dist-packages/easysnmp/session.py", line 467, in walk
interface.walk(self, varlist)
File "/snmp/Device.py", line 105, in discover
walk = self.session.walk(oid)
File "snmp_poll.py", line 97, in check_discovery
dis = s.discover(i['discovery_oid'], i['filter'], i['replace'], i['strip'])
File "snmp_poll.py", line 284, in poll_device
check_discovery(s, d, name)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
You can create a lock via the multiprocessing module and add the context manager of the lock around whenever you are walking and it won't crash after that, ex:
from multiprocessing.dummy import Pool, Manager
from functools import partial
from easysnmp import Session
def poll_device(community, lock, ip)
session = Session(hostname=ip, community=community, version=2, use_numeric=True)
with lock:
session.walk(oid)
ips = ['8.8.8.8', '1.1.1.1']
community = 'password'
m = Manager()
lock = m.Lock()
pool = Pool(2)
func = partial(poll_device, community, lock)
pool.map(func, ips)
pool.close()
pool.join()
Likely there should just be a wiki page explaining walking is not thread safe and you must implement proper locking around it to avoid crashes.