esp-mqtt Feature request: Separate locking of outbox (IDFGH-7921)

As discussed in https://github.com/espressif/esp-mqtt/issues/230, there are a variety of situations where having separate locks for outbox operations would be desirable.

In the case of esp_mqtt_client_enqueue, if the client is locked for extended periods of time due to waiting on esp_transport_write / network timeout, new messages cannot be enqueued quickly from an external task when locks are enabled. A suitable workaround was discussed, but is a bit ugly from a user implementation perspective (e.g. having to define a message container type, alloc, dealloc, implement user events). Having a separate lock that for outbox for msg enqueue/dequeue would result in something that is much closer to a non-blocking API.
As discussed in this comment, having a separate outbox lock would allow for the user to implement additional outbox logic such as limiting the number of in-flight messages, checking if a certain msg_id was delivered yet, etc.

On first glance it seems this might be tricky to implement, since outbox_enqueue, outbox_set_pending, etc are often referencing pending_msg_id.

But thinking about it more, all of the outbox APIs are already pretty self-contained, and enqueueing new messages would really only need to know about connection->last_message_id which could be moved to the outbox handle. Then perhaps if the outbox was also used for qos0 messages, it would be possible to call mqtt_client_enqueue_priv without necessarily needing to lock the client, since it appears the values are just assigned to client->mqtt_state, and then copied back over in mqtt_enqueue.

The caveats I see are mqtt_resend_queued uses a dequeued outbox item, which might cause issues. And also not sure how about logic around fragmented messages. It might be easier to make it so that esp_mqtt_client_enqueue does not use mqtt_client_enqueue_priv and instead just creates a outbox_message_t and calls outbox_enqueue.

Jul 26 '22 18:07 someburner

Hi @someburner, after some work in this issue it proved to be harder than originally expected. As you pointed, to reach the outbox we need access to information from the client data structure and to do it safely we would need to do some refactorings, that we might do at some point.

I'll keep this open, since the feature is something we may add in the future.

Jun 27 '23 14:06 euripedesrocha

@euripedesrocha

Thanks! And yes I still think this would be a good feature.

For anyone else needing this functionality, here is a patch (on an older master) I have done to achieve (what I believe to be) a safe and mostly non blocking approach with the current state of the repo. And perhaps these will help inform @euripedesrocha about the intended use case for such a feature.

To recap: In my application I must limit the number of in-flight QoS1 messages to 1 in order to achieve in-order reception at the broker. MQTT by default does not guarantee order of QoS1, but capping number of in-flight messages is a feature of many client libraries. QoS2 is far too much overhead.

And here's what I do:

esp-mqtt changes

mqtt.patch.txt

Only allow 1 QoS1 message in-flight at any point in time. In patch above, you see if the QoS1 message is a subscribe type it gets priority over publish payloads. Note that in this case, I make the assumption that the only type of QoS1 payloads will ever be a subscribe or publish, and I limit number of QoS1 publishes at the application level, so if there's more than 1, it must be a subscribe. For this I added a outbox_get_qos1_count helper method. Maybe not the best approach but it works.
Don't delete any QoS1 messages from outbox ever. Again, needed for my use case, I'm OK with deleting QoS0 eventually though.
Originally I was using outbox_get_qos1_count for the application to see if it could send the next QoS1 pkt, but then this ran into the same problems with locking. Thus, the rest of the logic has to be implemented on the user side of code. The one exception is I had to change esp_mqtt_dispatch_custom_event to use portMAX_DELAY instead of 0 since I was occasionally missing esp-mqtt user events, which in this case is not acceptable.

user changes

Basically it is just a thread-safety layer between esp-mqtt client callbacks and user code. Honestly, it's quite a lot of work, so I will just describe an overview of what has to be done, and if anyone wants more detail feel free to comment.

Use esp_mqtt_client_register_event to register an event handler for all esp-mqtt events.
Define your own MQTT event data wrapper, to pass whatever you need from the esp-mqtt thread to your external thread
Create xQueues to send/receive your custom data type in a thread-safe manner
use the MQTT_EVENT_USER for publishing and subscribing

So instead of enqueuing directly with a call the client, I create an MQTT_EVENT_USER that has all my publish data (topic, data, qos, etc), and send that to the MQTT event handler. Then inside the event handler, which is run inside the esp-mqtt thread, that is where the actual publish or subscribe is called.

If I want to do a QoS1 publish, in the external user thread I increment a qos1_inflight_count++. When I get a puback event (propagated from esp-mqtt to my own external thread) I decrement that. Then my external thread can safely check what qos1_inflight_count is.

TLDR

It's ton of work and overhead if you want non-blocking interaction with esp-mqtt.

That said, I have been testing with this approach for a long time and it appears to work well, so at least there's that.

Jun 27 '23 21:06 someburner

@someburner thanks for the feedback, I'll look into the control of in flight messages, and if it should be added as a general option to esp-mqtt.

I think that for this particular problem, the solution would be to have a custom outbox controlling the number of QoS1 messages would be a simpler solution. In the latest master, we changed the internal struct for the outbox in a way that I believe it's easier to achieve your goal.

Jun 28 '23 05:06 euripedesrocha

I believe regardless of outbox being used, there is still the issue of API locks, no? In my case I can't block for very much time at all trying to acquire a lock that is being held by esp-mqtt thread. The internal structure of the outbox is a rather minor issue.

I looked at the commits and don't see anything that looks like it might result in less blocking or help with accessing shared resources better. Is there something I'm missing?

I'll look into the control of in flight messages

Having some built-in functionality for this is only "nice to have" (and should be rather simple IMO). The real issue is being able to know, from another thread, with little to no blocking, what the status of the messages in the outbox are.

Iterating over a short linked list should be pretty fast. But because esp-mqtt uses one big mutex for just about everything, it's hard to make predictions about access times with the APIs.

I will say the event loop does work quite well for these sorts of things, but it just feels really messy to have to do things this way. When I get some time I will make more of an effort to get acquainted with the code and see if I can come up with any alternatives.

Jun 28 '23 07:06 someburner

Hi @someburner, regarding the issues with API locks you are correct, we have issues that we'll work to fix in the near future.

At the moment, the solution using the event loop is the way to go.

Jul 10 '23 07:07 euripedesrocha

esp-mqtt esp-mqtt copied to clipboard

Feature request: Separate locking of outbox (IDFGH-7921)

esp-mqtt changes

user changes

TLDR

esp-mqtt
esp-mqtt copied to clipboard