QAT_Engine
QAT_Engine copied to clipboard
nginx worker crashed in ASYNC_WAIT_CTX_get_fd
Hi: I'm running nginx in openssl 1.1.0e and qat driver of version 1.7, the nginx worker sometimes crashes in function ASYNC_WAIT_CTX_get_fd, not very often, it seems that the async job allocated by openssl has been released while qat engine is starting to wakeup the async job, here is the coredump stack:
0 0x00007efff7328f18 in ASYNC_WAIT_CTX_get_fd () from /export/openssl/libcrypto.so.1.1 #1 0x00007efff665e1bf in qat_wake_job (job=
, jobStatus= ) at qat_events.c:289 #2 0x00007efff63d75d7 in adf_user_notify_msgs_poll () from /usr/local/lib/libqat_s.so #3 0x00007efff63d31b8 in adf_pollRing () from /usr/local/lib/libqat_s.so #4 0x00007efff63d355f in icp_adf_pollInstance () from /usr/local/lib/libqat_s.so #5 0x00007efff63cc5b9 in icp_sal_CyPollInstance () from /usr/local/lib/libqat_s.so #6 0x00007efff665e4ce in poll_instances () at qat_polling.c:328 #7 0x00007efff665d7d6 in qat_engine_ctrl (e= , cmd= , i= , p=0x7ffc6ac74cac, f= ) at e_qat.c:835 #8 0x00007efff73da889 in ENGINE_ctrl_cmd () from /export/openssl/libcrypto.so.1.1 #9 0x00000000004a5e61 in qat_engine_poll (log=0x64c9d80) at modules/nginx_qat_module/ngx_ssl_engine_qat_module.c:542 #10 ngx_ssl_engine_qat_heuristic_poll (log=0x64c9d80) at modules/nginx_qat_module/ngx_ssl_engine_qat_module.c:666 #11 0x000000000044d1d6 in ngx_http_close_connection (c=c@entry=0x7effb4758700) at src/http/ngx_http_request.c:3782 #12 0x000000000045038c in ngx_http_ssl_handshake_handler (c=0x7effb4758700) at src/http/ngx_http_request.c:876 #13 0x000000000043d367 in ngx_ssl_handshake_handler (ev= ) at src/event/ngx_event_openssl.c:2114 #14 0x000000000043a27d in ngx_ssl_empty_handler (ev= ) at src/event/ngx_event_openssl.c:162 #15 0x000000000043095d in ngx_event_expire_timers () at src/event/ngx_event_timer.c:94 #16 0x00000000004305ca in ngx_process_events_and_timers (cycle=cycle@entry=0x2e374f0) at src/event/ngx_event.c:264 #17 0x00000000004376e2 in ngx_worker_process_cycle (cycle=0x2e374f0, data= ) at src/os/unix/ngx_process_cycle.c:771 #18 0x0000000000435da2 in ngx_spawn_process (cycle=cycle@entry=0x2e374f0, proc=proc@entry=0x43764f <ngx_worker_process_cycle>, data=data@entry=0x1d, name=name@entry=0x4aac95 "worker process", respawn=respawn@entry=-4) at src/os/unix/ngx_process.c:199 #19 0x000000000043692b in ngx_start_worker_processes (cycle=cycle@entry=0x2e374f0, n=32, type=type@entry=-4) at src/os/unix/ngx_process_cycle.c:362 #20 0x000000000043840a in ngx_master_process_cycle (cycle=0x2e374f0, cycle@entry=0x729090) at src/os/unix/ngx_process_cycle.c:247 #21 0x000000000041271d in main (argc= , argv= ) at src/core/nginx.c:397
this situation may happen when the SSL can not be established while the qat is still working on crypto steps, if the async timer was out of time, the nginx call ngx_ssl_shutdown() anyway to stop this session, but the async jobs can not be released,the qat can still wakeup the job when it finished work. how to solve this problem? it seems that the openssl lib does not provide the api to release the async job for user apps.
@zspirate Thanks for the information. We will check and comeback on this. BTW what is the version of QAT Engine you are using ?
@zspirate In order to help us recreate your problem and to check whether your version of QAT engine is missing any fixes in this area in later releases of QAT engine, could you provide complete version info for QAT Engine, nginx and QAT driver? Thanks.
@zspirate Also Can you please use latest OpenSSL version (1.1.0l or 1.1.1f). Similar issue is fixed in this commit 6038.
@zspirate Thanks for the information. We will check and comeback on this. BTW what is the version of QAT Engine you are using ?
I use the latest version of qat engine,the openssl is 1.1.1b, the nginx is 1.16(with intel qat async patch), and the driver is 1.7.0
Hi @zspirate We are currently looking into this. Please could you forward the QAT driver config files you used when this core dump was created together with the nginx.conf file (as an attachment). Many thanks in advance. paulturx
Hi @zspirate We are currently looking into this. Please could you forward the QAT driver config files you used when this core dump was created together with the nginx.conf file (as an attachment). Many thanks in advance. paulturx here is the conf files
Hi @zsprirate We would be very interested to see whether you are able to reproduce the problem on your set-up with the nginx.conf parameter 'multi_accept' set to 'off' (or else not specifically set at all since the default is 'off') and get back to us with the results. Thanks in advance, paulturx
Hi @zsprirate We would be very interested to see whether you are able to reproduce the problem on your set-up with the nginx.conf parameter 'multi_accept' set to 'off' (or else not specifically set at all since the default is 'off') and get back to us with the results. Thanks in advance, paulturx
it works!!!,no coredump anymore, but i still don't understand why this parameter 'multi_accept' affect
I would also like to understand this better. Is there a limit to the number of connections accepted? Is this related to how qat_pause_job()/qat_wake_job()
behave?
I would also like to understand this better. Is there a limit to the number of connections accepted? Is this related to how
qat_pause_job()/qat_wake_job()
behave?
this problem occurs again,confused。