OpenMQTTGateway icon indicating copy to clipboard operation
OpenMQTTGateway copied to clipboard

Fixed data corruption associated with jsonQueue enqueueing and dequeuing

Open puterboy opened this issue 1 year ago • 6 comments

  • Changed jsonQueue structure and associated routines to use json String rather than json Object
  • Added Mutex protection to emptyQueue()
  • Moved Mutex protection to within enqueueJsonObject() to significantly shorten lock time

Description:

This fixes a severe bug whereby any MQTT messages enqueued on the jsonQueue were subject to unpredictable data corruption if the encoded json documents were more than one level deep since pushing/popping onto & from the queue do only shallow copies. This corruption could happen even with a queue size of one if the memory is overwritten. In practice, this caused sporadic data MQTT message corruption for me as discussed in https://github.com/1technophile/OpenMQTTGateway/issues/2014

  1. NOTE YOU SHOULD ALSO BE ABLE TO REVERT THE REVERT THE CHANGE IN ZgatewayRTL_433.ino that called pub directly instead of using the JSON queue It's pretty likely that this PR fixes the cause of the instability that others experienced.
  2. NOTE THAT THIS PR SUPERSEDES https://github.com/1technophile/OpenMQTTGateway/pull/2025

Specifically, the PR does the following:

  • Changes the format of the queue (and the associated actions enqueueJsonObject() and emptyQueue()) to push/pull JSON strings rather than Documents onto the queue
  • Extends the Mutex protection (xQueueMutex) that was previously applied only to enqueuing als to emptyQueue()
  • Shortens the lock time by moving the Mutex for enqueueing into the enqueueJsonObject() routine rather than handleJsonObject() since only the actual push needs to be protected while the deserializing and other actions don't

Note this leaves handleJsonObject() as a stub that just calls eqnueueJsonObject().

Checklist:

  • [X] The pull request is done against the latest development branch
  • [X] Only one feature/fix was added per PR and the code change compiles without warnings
  • [X] I accept the DCO.

puterboy avatar Aug 28 '24 17:08 puterboy

Note now that you are emptying the queue with jsonDispatch that assumes an origin key, you could save some processes if you stored the origin and the rest of the json string on the queue and then called a new version of jsonDispatch that takes a topic and a json string. Perhaps push origin, then push string. Then pop string, then origin. If you do this then the check for 'origin' should be when you enqueue and doesn't need to be done when you dequeue.

Currently emptyQueue serialize the json string and then jsonDispatch de-serializes it - so you could save this round-trip.

puterboy avatar Aug 28 '24 17:08 puterboy

Great, thanks for this!

What do you think about using a char array instead of a String ?

1technophile avatar Aug 28 '24 18:08 1technophile

That would of course be fine.

puterboy avatar Aug 28 '24 18:08 puterboy

Please also look at my note above regarding creating a new version of jsonDispatch to avoid having to serialize and deserialize.

In that vein, it may make sense to create a lower level enqueue that took a pair of strings: origin, jsonString and push them successively on the queue. Perhaps call it enqueueJsonString

Then a higher level enqueue routine could take just a jsonDocument, extract & remove the origin key & value, serialize the remaining, and call the lower-level enqueue routine. That higher level routine would be the only that would need the error checking to see if origin key exist.

For emptyQueue, after extracting the origin and content strings, they would be sent to a version of jsonDispatch that accepts such a pair. Note: I think emptyQueue is a bit of a misnomer since it really only removes one element from the queue. Maybe it should be called dequeueJsonString

Also you might want to remove the now unnecessary handleJsonEnqueue wrapper - you may though want to create an overloaded version of enqueueJsonObject (or whatever it will be called) that doesn't require the timeout.

I am really getting beyond my skill level here - so now that I have a POC working, would you mind taking it from here to polish?

puterboy avatar Aug 28 '24 18:08 puterboy

would you mind taking it from here to polish?

Yes sure, I would need your help to retest after by the way

1technophile avatar Aug 28 '24 18:08 1technophile

Of course! Great platform! And kudos for all the work you have put into it...

puterboy avatar Aug 28 '24 18:08 puterboy

I will merge this one and start from there. Thanks!

1technophile avatar Aug 29 '24 12:08 1technophile