micromdm Race condition in command queue

Race condition in command queue

Open jessepeterson opened this issue 5 years ago • 2 comments

What version of `micromdm` are you using?

HEAD on master

What `micromdm` command did you run?

Extensive use of API sending commands, etc.

What did you expect to see?

Correct commands sent to device.

What did you see instead?

Commands failing to get queued for the device (and subsequently sent out).

When a device's DeviceCommand struct gets too big — usually with too many completed commands — the time to load and save that data gets longer and longer. This raises a race condition between queuing commands (which is a load and save operation) and de-queueing/popping commands for sending to the device (which is also a load & save op). Essentially the two load operations happen before a single save operation and you get overwritten & incorrect DeviceCommand blocks per device.

Removing completed commands restores quick load & writes restores normal operations for me. The foreshadowing of these line came true:

https://github.com/micromdm/micromdm/blob/91c236c8c3a2fa39d02f3e0932ee009e31ad8a48/platform/queue/device_command.go#L30-L33

Dec 27 '18 23:12 jessepeterson

As a temporary workaround until a longer term solution is come up with, this little patch will clear the completed command queue every time the command queue is popped:

--- a/platform/queue/queue.go
+++ b/platform/queue/queue.go
@@ -118,6 +118,8 @@ func (db *Store) nextCommand(ctx context.Context, resp mdm.Response) (*Command,
                }
        }
 
+       dc.Completed = []Command{}
+
        if err := db.Save(dc); err != nil {
                return nil, err
        }

With the device struct being a much more manageable size (by not having every completed command saved) the save and load operations from BoltDB happen quick enough that pushing/popping commands no longer race.

Feb 06 '19 21:02 jessepeterson

Could this be the case for my endless loop with the enrollment profile getting removed as soon as the deviceComplete is sent? Why do one want to save the completed que when web hooks are there to answer every que event? At least there should be an cleaning operation on a schedule or by command.

Nov 11 '19 08:11 tomaswallentinus

micromdm micromdm copied to clipboard

Race condition in command queue

What version of micromdm are you using?

What micromdm command did you run?

What did you expect to see?

What did you see instead?

micromdm
micromdm copied to clipboard

What version of `micromdm` are you using?

What `micromdm` command did you run?