tengine icon indicating copy to clipboard operation
tengine copied to clipboard

kernel: dh895xcc 0000:60:00.0: Process nginx exit with orphan rings

Open lastpepole opened this issue 10 months ago • 5 comments

Ⅰ. Issue Description

QAT_Engine-1.5.0 ,QAT驱动:QAT.L.4.23.0-00001,OpenSSL 1.1.1w

./sbin/nginx -c ./conf/nginx.conf

pstack 19641 Thread 2 (Thread 0x7f37d49b5700 (LWP 19642)): #0 0x00007f37d7ee34ed in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f37d7ededcb in _L_lock_883 () from /lib64/libpthread.so.0 #2 0x00007f37d7edec98 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f37d68806cf in qat_timer_poll_func (ih=) at qat_hw_polling.c:152 #4 0x00007f37d7edcdd5 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f37d6dd8ead in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f37d8503740 (LWP 19641)): #0 0x00007f37d7ee34ed in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f37d7ee0a42 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #2 0x00007f37d687f893 in qat_hw_init (e=e@entry=0x16fdc20) at qat_hw_init.c:770 #3 0x00007f37d687c6c0 in qat_engine_init (e=e@entry=0x16fdc20) at e_qat.c:603 #4 0x00007f37d687d250 in engine_init_child_at_fork_handler () at qat_fork.c:108 #5 0x00007f37d6da007e in fork () from /lib64/libc.so.6 #6 0x000000000043918e in ngx_daemon (log=0x1713d38) at src/os/unix/ngx_daemon.c:17 #7 0x0000000000413563 in main (argc=, argv=) at src/core/nginx.c:378

Ⅱ. Describe what happened

Ⅲ. Describe what you expected to happen

Ⅳ. How to reproduce it (as minimally and precisely as possible)

Ⅴ. Anything else we need to know?

  1. If applicable, add nginx debug log doc.

Ⅵ. Environment:

  • Tengine version (use sbin/nginx -V): Tengine version: Tengine/2.4.0 nginx version: nginx/1.22.1 built by gcc 4.8.5 built with OpenSSL 1.1.1w 11 Sep 2023 TLS SNI support enabled configure arguments: --prefix=/home/test/third/tengine --with-openssl-async --with-openssl=/usr/local/openssl
  • OS (e.g. from /etc/os-release): CentOS Linux release 7.5
  • Kernel (e.g. uname -a): 5.10
  • Others:

lastpepole avatar Apr 16 '24 05:04 lastpepole

这个stack不完整,退出的信号量?

还有exit具体指的是core dump ?

如果是core了 提供完整的stack trace以及debug级别的error log。

lianglli avatar Apr 16 '24 08:04 lianglli

这个stack不完整,退出的信号量?

还有exit具体指的是core dump ?

如果是core了 提供完整的stack trace以及debug级别的error log。

没有core, 卡在__lll_lock_wait函数这里了,并且/var/log/messages出现nginx exit with orphan rings日志。

lastpepole avatar Apr 16 '24 11:04 lastpepole

@lianglli 上面这个问题可稳定复现,启动tengine就会卡主。辛苦帮忙看下或者复现哪里异常导致的?

lastpepole avatar Apr 17 '24 03:04 lastpepole

参考一下:Ice Lake SSL/TLS加速实践 https://openanolis.cn/sig/crypto/doc/390714951012679780

lianglli avatar Apr 18 '24 03:04 lianglli

参考一下:Ice Lake SSL/TLS加速实践 https://openanolis.cn/sig/crypto/doc/390714951012679780

@lianglli 卡在问题看着是qat engine代码问题。辛苦看下https://github.com/alibaba/tengine/issues/1932 这个问题。

lastpepole avatar Apr 18 '24 03:04 lastpepole