micromdm
micromdm copied to clipboard
Race condition in command queue
What version of micromdm
are you using?
HEAD on master
What micromdm
command did you run?
Extensive use of API sending commands, etc.
What did you expect to see?
Correct commands sent to device.
What did you see instead?
Commands failing to get queued for the device (and subsequently sent out).
When a device's DeviceCommand
struct gets too big — usually with too many completed commands — the time to load and save that data gets longer and longer. This raises a race condition between queuing commands (which is a load and save operation) and de-queueing/popping commands for sending to the device (which is also a load & save op). Essentially the two load operations happen before a single save operation and you get overwritten & incorrect DeviceCommand
blocks per device.
Removing completed commands restores quick load & writes restores normal operations for me. The foreshadowing of these line came true:
https://github.com/micromdm/micromdm/blob/91c236c8c3a2fa39d02f3e0932ee009e31ad8a48/platform/queue/device_command.go#L30-L33
As a temporary workaround until a longer term solution is come up with, this little patch will clear the completed command queue every time the command queue is popped:
--- a/platform/queue/queue.go
+++ b/platform/queue/queue.go
@@ -118,6 +118,8 @@ func (db *Store) nextCommand(ctx context.Context, resp mdm.Response) (*Command,
}
}
+ dc.Completed = []Command{}
+
if err := db.Save(dc); err != nil {
return nil, err
}
With the device struct being a much more manageable size (by not having every completed command saved) the save and load operations from BoltDB happen quick enough that pushing/popping commands no longer race.
Could this be the case for my endless loop with the enrollment profile getting removed as soon as the deviceComplete is sent? Why do one want to save the completed que when web hooks are there to answer every que event? At least there should be an cleaning operation on a schedule or by command.