mosquitto loop exits on TCP handshake failure for TLS
mosquitto v2.0.15 platform: linux Using client library in threaded async mode with TLS connection to broker.
If, during the initial TCP connection setup, the server responds with a TCP RST (e.g. broker application down), then quite often the mosquitto thread exits, thus requiring a manual re-start of the client, which is unfeasible because the lib does not even notify the host application about this premature exit.
The exit point is in mosquitto_loop_forever() when dealing with mosquitto_loop() retcode: https://github.com/eclipse/mosquitto/blob/master/lib/loop.c#L276 (rc is MOSQ_ERR_ERRNO and errno is EPROTO).
It seems to me that everything is originated from net__handle_ssl() (called by net__read()): the SSL error SSL_ERROR_SYSCALL (which is returned in this scenario, since the underlying TCP socket got a ECONNREFUSED) is not handled anymore (it was, before the commit mentioned here below), so the function returns -1 and sets errno to EPROTO, which triggers the above loop exit condition.
This bug could be possibly introduced by this commit: https://github.com/eclipse/mosquitto/commit/e979a46c048a8c60c53614548c4a98dfd4992cf4
I can confirm that this commit https://github.com/eclipse/mosquitto/commit/e979a46c048a8c60c53614548c4a98dfd4992cf4 introduces this issue (because of SSL_ERROR_SYSCALL not handled anymore during connection setup). I successfully tested the same scenario with v2.0.14, no problems there.
Yes, we also face the same issue. Currently we downgraded to 2.0.14. Waiting for issue to be fixed.
On SSL_get_error man page, it says:
On an unexpected EOF, versions before OpenSSL 3.0 returned SSL_ERROR_SYSCALL, nothing was added to the error stack, and errno was 0. Since OpenSSL 3.0 the returned error is SSL_ERROR_SSL with a meaningful error on the error stack (SSL_R_UNEXPECTED_EOF_WHILE_READING). This error reason code may be used for control flow decisions (see the man page for ERR_GET_REASON(3) for further details on this).
As stated on other issues, such as https://github.com/eclipse/mosquitto/issues/2767 , seems like the library has broken the reconnection for EOF on https://github.com/eclipse/mosquitto/commit/e979a46c048a8c60c53614548c4a98dfd4992cf4 . It is broken from v2.0.15 to v2.0.18.
Taking a look at the library implementation, on v2.0.18 under lib/net_mosq.c at net__handle_ssl, there could exist some checks for non fatal SSL server errors (SSL_ERROR_ZERO_RETURN, SSL_ERROR_NONE, SSL_ERROR_SSL - SSL_ERROR_SYSCALL and errno 0 for OpenSSL < 3.0.0 - and possible others?) that the client could return MOSQ_ERR_CONN_LOST, what would trigger the automatic reconnection. Currently if a SSL error happens on server side, like proxy down, EOF, empty reply, firewall rule added, the SSL goes to errno EPROTO, exiting the mosquitto_loop_forever as fatal error.
Treating these non fatal SSL errors as MOSQ_CONN_LOST and considering the fatal SSL errors as MOSQ_ERR_TLS could potentially allow to remove the check by 'errno == EPROTO' on mosquitto_loop_forever, what I find a trick to handle on async implementations, but unsure if it is used for other purpose that I'm not aware.
Is this issue fixed ? Thanks
Random guess: Nope.
Any reason why you think this should be fixed already?