mosquitto thread does not exit if currently reconnecting
Hi, I found a scenario in which the mosquitto_loop_stop(), called with force=false, never returns, leaving the hosting application stuck forever: mosquitto_loop_stop() remains blocked on pthread_join() call, because the network thread does not exit even if mosquitto_disconnect() has been called prior to mosquitto_loop_stop().
Issue seen in lib 1.6.12 in threaded mode on Linux - but, by inspecting the code, I would say that this issue affects current master's head, too.
I analyzed the runtime behavior by adding logs to the lib and found out why this is happening. Please consider this sequence: 1.. network thread is blocked (for some seconds) inside net__socket_connect(blocking=true) - called by mosquitto__reconnect() (which is in turn called by mosquitto_loop_forever()). This happens e.g. if broker is currently not reachable anymore, so the thread is trying to reconnect - thus net__socket_connect() waits for connection establishment. 2.. at the very same time, application calls mosquitto_disconnect(), which sets client state to mosq_cs_disconnected (which is the condition checked by the network thread to break its loop and exit). A good way to call mosquitto_disconnect() at the right moment to trigger the issue is to add logs before and after the net__socket_connect() (see step above). 3.. then, network thread returns from the net__socket_connect() call, with rc != 0 (broker is still unreachable) - and so client state is reset to mosq_cs_connect_pending:
{
rc = net__socket_connect(mosq, mosq->host, mosq->port, mosq->bind_address, blocking);
}
if(rc>0){
mosquitto__set_state(mosq, mosq_cs_connect_pending);
return rc;
}
this irreparably overrides the exit request by API of step 2. 4.. as a consequence, network thread does not exit (state is not mosq_cs_disconnected anymore) 5.. application now calls mosquitto_loop_stop(force=false), which remains stuck forever on the pthread_join() call - thread is not breaking its loop.
As a real world example, steps 2+5 are a very normal application shutdown sequence, which then may cause application hang while exiting with broker currently unreachable.
Personal thought: I think that the tecnique of setting the state from API to make the thread exit is very risky, given the continuos re-set of the state performed by the network thread on itself. Indeed, I suspect that the above sequence may not be the only problematic one. A different method would be required to be sure that the network thread always exits when requested. Since mosquitto has already a socktpair for application thread to network thread communication, an idea would be to use that to signal the exit request (e.g. by writing a "special value" byte - currently the lib writes always a zero byte value). Then, when reading from mosq->sockpairR in mosquitto_loop() and interruptible_sleep(), the network thread may check the read byte value and signal "exit from API request" condition to mosquitto_loop_forever() with a new specific rc value - to break its loop. Alternatively, a global "thread exit requested" flag may be set by API and read by the thread, in a mutexed context.