EasyLogger icon indicating copy to clipboard operation
EasyLogger copied to clipboard

多线程使用一段时间后出现死锁

Open can9010 opened this issue 5 years ago • 14 comments

arm linux环境

(gdb) bt
#0  0x0000007f90ea9064 in __lll_lock_wait () from target:/lib64/libpthread.so.0
#1  0x0000007f90ea1a4c in pthread_mutex_lock ()
   from target:/lib64/libpthread.so.0
#2  0x00000000004cad80 in elog_port_output_lock ()
    at ./3rd_lib/easylogger/port/elog_port.c:76
#3  0x00000000004cbf78 in elog_get_filter_tag_lvl (tag=0x651c40 "TASK")
    at ./3rd_lib/easylogger/src/elog.c:435
#4  0x00000000004cc2e8 in elog_output (level=2 '\002', tag=0x651c40 "TASK", 
    file=0x651968 "./devices/devices.c", 
    func=0x651f68 <__FUNCTION__.12424> "timer1_handle", line=217, 
    format=0x651c18 "timming update_devices_status %d")
    at ./3rd_lib/easylogger/src/elog.c:528
#5  0x00000000004e9314 in timer1_handle (sig=10) at ./devices/devices.c:217
#6  <signal handler called>
#7  0x0000007f90d0f6d8 in nanosleep () from target:/lib64/libc.so.6
#8  0x0000007f90d0f568 in sleep () from target:/lib64/libc.so.6
#9  0x00000000004e75ec in main (argc=1, argv=0x7fea78d518) at ./main.c:98
(gdb) thread 2
[Switching to thread 2 (Thread 3106.3107)]
#0  0x0000007f90ea9034 in __lll_lock_wait () from target:/lib64/libpthread.so.0
(gdb) bt
#0  0x0000007f90ea9034 in __lll_lock_wait () from target:/lib64/libpthread.so.0
#1  0x0000007f90ea1a4c in pthread_mutex_lock ()
   from target:/lib64/libpthread.so.0
#2  0x00000000004cad80 in elog_port_output_lock ()
    at ./3rd_lib/easylogger/port/elog_port.c:76
#3  0x00000000004cbf78 in elog_get_filter_tag_lvl (tag=0x651c40 "TASK")
    at ./3rd_lib/easylogger/src/elog.c:435
#4  0x00000000004cc2e8 in elog_output (level=2 '\002', tag=0x651c40 "TASK", 
    file=0x651968 "./devices/devices.c", 
    func=0x651f68 <__FUNCTION__.12424> "timer1_handle", line=226, 
    format=0x651c48 "timming check_devices_status %d")
    at ./3rd_lib/easylogger/src/elog.c:528
#5  0x00000000004e93b4 in timer1_handle (sig=10) at ./devices/devices.c:226
#6  <signal handler called>
#7  0x0000007f90d2e6e8 in write () from target:/lib64/libc.so.6
#8  0x0000007f90cddeb0 in _IO_file_write () from target:/lib64/libc.so.6
#9  0x0000007f90cdd2f8 in ?? () from target:/lib64/libc.so.6
#10 0x0000007f90cde648 in _IO_file_xsputn () from target:/lib64/libc.so.6
#11 0x0000007f90cb8e90 in ?? () from target:/lib64/libc.so.6
#12 0x0000007f90cb6824 in vfprintf () from target:/lib64/libc.so.6
#13 0x0000007f90cbd948 in printf () from target:/lib64/libc.so.6
#14 0x00000000004cad54 in elog_port_output (
    log=0x7065b8 <poll_get_buf> "D/HEX JP_PLC: 0000-0017: 55 ...***... 0A"..., size=127)
    at ./3rd_lib/easylogger/port/elog_port.c:65
#15 0x00000000004cde18 in async_output (arg=0x0)
    at ./3rd_lib/easylogger/src/elog_async.c:299
#16 0x0000007f90e9f0e8 in start_thread () from target:/lib64/libpthread.so.0
#17 0x0000007f90d3af4c in ?? () from target:/lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 3106.3108)]
#0  0x0000007f90ea9064 in __lll_lock_wait () from target:/lib64/libpthread.so.0
(gdb) bt
#0  0x0000007f90ea9064 in __lll_lock_wait () from target:/lib64/libpthread.so.0
#1  0x0000007f90ea1a4c in pthread_mutex_lock () from target:/lib64/libpthread.so.0
#2  0x00000000004cad80 in elog_port_output_lock ()
    at ./3rd_lib/easylogger/port/elog_port.c:76
#3  0x00000000004cbf78 in elog_get_filter_tag_lvl (tag=0x651c40 "TASK")
    at ./3rd_lib/easylogger/src/elog.c:435
#4  0x00000000004cc2e8 in elog_output (level=2 '\002', tag=0x651c40 "TASK", 
    file=0x651968 "./devices/devices.c", 
    func=0x651f68 <__FUNCTION__.12424> "timer1_handle", line=226, 
    format=0x651c48 "timming check_devices_status %d")
    at ./3rd_lib/easylogger/src/elog.c:528
#5  0x00000000004e93b4 in timer1_handle (sig=10) at ./devices/devices.c:226
#6  <signal handler called>
#7  0x0000007f90d0f6d8 in nanosleep () from target:/lib64/libc.so.6
#8  0x0000007f90d34d08 in usleep () from target:/lib64/libc.so.6
#9  0x00000000004e6bf4 in http_deal (arg=0x0) at ./service/http.c:415
#10 0x0000007f90e9f0e8 in start_thread () from target:/lib64/libpthread.so.0
#11 0x0000007f90d3af4c in ?? () from target:/lib64/libc.so.6

can9010 avatar Mar 19 '20 04:03 can9010

能否描述下具体的现象,有使用 linux 自带的 demo 做测试吗?

armink avatar Mar 20 '20 07:03 armink

就是使用linux demo移植到arm linux板子上使用的,easylogger部分没改过东西,gdb调试的时候有一共11个线程,另外还有一个定时器调用。使用的时候只有在一定的压力测试下几个钟后才出现。

can9010 avatar Mar 22 '20 01:03 can9010

https://github.com/armink/EasyLogger/blob/master/demo/os/linux/easylogger/port/elog_port.c#L77

试着在这里加些记录信息,记录下上次是哪个线程 成功调用 的,该线程状态如何

armink avatar Mar 22 '20 03:03 armink

我也遇到了同样的问题

mingpuwu avatar May 27 '20 13:05 mingpuwu

https://github.com/armink/EasyLogger/blob/master/demo/os/linux/easylogger/port/elog_port.c#L77

试着在这里加些记录信息,记录下上次是哪个线程 成功调用 的,该线程状态如何

如果上次调用的线程调用完cancel掉了,会有问题吗?

mingpuwu avatar May 27 '20 13:05 mingpuwu

https://github.com/armink/EasyLogger/blob/master/demo/os/linux/easylogger/port/elog_port.c#L77 试着在这里加些记录信息,记录下上次是哪个线程 成功调用 的,该线程状态如何

如果上次调用的线程调用完cancel掉了,会有问题吗?

线程是如何 cancel 的?是正常退出,还是强制?

armink avatar May 28 '20 00:05 armink

https://github.com/armink/EasyLogger/blob/master/demo/os/linux/easylogger/port/elog_port.c#L77 试着在这里加些记录信息,记录下上次是哪个线程 成功调用 的,该线程状态如何

如果上次调用的线程调用完cancel掉了,会有问题吗?

线程是如何 cancel 的?是正常退出,还是强制?

用的pthread_cancel。这个会有影响吗?难道是进到lock里面退出,后面没有unlock吗?

mingpuwu avatar May 28 '20 01:05 mingpuwu

有可能的,其他线程直接 cancel 另外线程挺不安全的。这块建议使用通知的方式,通知到当前线程,线程自行 return 退出

armink avatar May 28 '20 01:05 armink

我试一下吧,谢谢

mingpuwu avatar May 28 '20 01:05 mingpuwu

我也是在linux环境,出现了死锁。 发现是我自己移植的问题。demo下有一个easyloger和根目录下有easyloger文件夹,两个容易搞混。用demo中的目录覆盖根目录中的easyloger,就没发现问题了。

SmartElec avatar Sep 22 '20 05:09 SmartElec

关注一下代码里有没有用信号量,如果锁没有出去,被信号量打断就会死锁,其次就是多线程不要随意cancel 掉,最好线程起来后就一直跑

On Sep 22, 2020, at 13:16, SmartElec [email protected] wrote:

 我也是在linux环境,出现了死锁。

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mingpuwu avatar Sep 22 '20 10:09 mingpuwu

关注一下代码里有没有用信号量,如果锁没有出去,被信号量打断就会死锁,其次就是多线程不要随意cancel 掉,最好线程起来后就一直跑 On Sep 22, 2020, at 13:16, SmartElec @.***> wrote:  我也是在linux环境,出现了死锁。 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

你的问题已经解决了吗。我重新移植了暂时还没出问题

SmartElec avatar Sep 23 '20 01:09 SmartElec

定时器与线程同时使用有可能会出现死锁。参考:https://clodfisher.github.io/2018/10/AlarmAndPthread/

liaojieliang avatar Apr 07 '22 10:04 liaojieliang

定时器与线程同时使用有可能会出现死锁。参考:https://clodfisher.github.io/2018/10/AlarmAndPthread/

是的,easylogger输出时有锁,当线程加锁后定时器抢占到加锁线程运行就死锁了。

can9010 avatar Apr 18 '22 09:04 can9010