bssl: Could not obtain wait context for job
Hello, I've observed a sigsegv under the described conditions: an error path which does not clear the current job/tlv.
I added some instrumentation to my application with the intel qat engine:
// rsa decrypt error
h2o[151965]: DEBUG [bssl_qat_async_start_job:479] saving job ctx:0x6030002489b8 job:0x6060001e3208 waitctx:0x6060001e3268
h2o[151965]: DEBUG [bssl_offload_create_request:1362] nb job:0x6030002489b8
h2o[151965]: [/build/deps/neverbleed/neverbleed.c] bssl_offload_decrypt:RSA decrypt failure:-:error:04000070:RSA routines:OPENSSL_internal:DATA_LEN_NOT_EQUAL_TO_MOD_LEN ossl err qat_hw_rsa.c 1664
h2o[151965]: [openssl-privsep] RSA decrypt failure
h2o[151965]: DEBUG [bssl_offload_free_request:698] nb finish job:0x6030002489b8
h2o[151965]: DEBUG [bssl_qat_async_finish_job:500] clearing ctx:0x6030002489b8 job:0x6060001e3208 waitctx:0x6060001e3268
// rsa sign
h2o[151965]: DEBUG [bssl_qat_async_start_job:479] saving job ctx:0x6030002489e8 job:0x6060001e32c8 waitctx:0x6060001e3328
h2o[151965]: DEBUG [bssl_offload_create_request:1362] nb job:0x6030002489e8
h2o[151965]: DEBUG [bssl_offload_digestsign:1418] rsa:0x611000059c88
h2o[151965]: DEBUG [qat_rsa_priv_enc:1021] here
h2o[151965]: DEBUG [qat_rsa_priv_enc:1106] here error
h2o[151965]: DEBUG [qat_rsa_priv_sign:1635] len:0, ASYNC_get_current_job(): 0x6060001e3208, (nil)
h2o[151965]: received SIGSEGV: si_code=1 (SEGV_MAPERR: Address not mapped to object) si_addr=(nil)
h2o[151965]: received SIGSEGV (cont): rip=(nil) rsp=0x7fe1bab6b688 rbp=(nil)
We can see that ASYNC_get_current_job(): 0x6060001e3208 which is the previous job, the failed rsa decryption.
Here's how the API is being used at a high level which causes the issue
// rsa decrypt
async_ctx = bssl_qat_async_start_job()
bssl_qat_async_save_current_job // --> set's the current job
meth = bssl_engine_get_rsa_method();
meth->decrypt(...)
// in_len != rsa_size --> OPENSSL_PUT_ERROR, return error
bssl_qat_async_finish_job(async_ctx)
// note: the current job is still set here...
// rsa sign
async_ctx = bssl_qat_async_start_job()
meth = bssl_engine_get_rsa_method();
meth->sign_raw(...)
qat_rsa_priv_sign
qat_rsa_priv_enc
qat_rsa_decrypt
qat_init_op_done
job = ASYNC_get_current_job() // this is the previous job, which has been freed/zero'd
qat_setup_async_event_notification(job)
ASYNC_get_wait_ctx(job) is NULL --> WARN("Could not obtain wait context for job\n");
ASYNC_current_job_last_check_and_get // this is the previous job, which has been freed/zero'd
job->tlv_destructor // --> SIGSEGV
Engine Logs:
[DEBUG][4211829.817191] PID [304707] Thread [7ff7a2c5e700][qat_hw_rsa.c:198:qat_rsa_decrypt()] - Started
[WARN][4211829.817195] PID [304707] Thread [7ff7a2c5e700][qat_events.c:129:qat_setup_async_event_notification()] Could not obtain wait context for job
[WARN][4211829.817200] PID [304707] Thread [7ff7a2c5e700][qat_hw_rsa.c:210:qat_rsa_decrypt()] Failed to setup async event notifications
[WARN][4211829.817206] PID [304707] Thread [7ff7a2c5e700][qat_hw_rsa.c:1017:qat_rsa_priv_enc()] Failure in qat_rsa_decrypt fallback = 0
I can see two potential solutions:
- (diff) one would be to call
_ret = ASYNC_current_job_last_check_and_get();before those earlier returns - call tlv_destructor (if set) in
bssl_qat_async_finish_job
Any thoughts?
@sharksforarms Looks out to be an issue in the scenario. Can you please raise PR for the changes mentioned in the diff. We will evaluate and let you know the feedback.
@sharksforarms Looks out to be an issue in the scenario. Can you please raise PR for the changes mentioned in the diff. We will evaluate and let you know the feedback.
PR is here: https://github.com/intel/QAT_Engine/pull/291/files
Thanks!
@sharksforarms Looks out to be an issue in the scenario. Can you please raise PR for the changes mentioned in the diff. We will evaluate and let you know the feedback.
PR is here: https://github.com/intel/QAT_Engine/pull/291/files
Thanks!
Thanks We will check and let you know.
@sharksforarms , We followed the steps shared by you and we are not able to see the issue in our reproduction. Do you have any test application to reproduce the scenario? Also what is the boringSSL version you have used?