fawkes
fawkes copied to clipboard
clips-executive: release lock if we suddenly become the owner
We encountered a case, where the local mutex fact was saying that another robot held the lock, hence the own mutex request was rejected. After the rejection went through we finally got the update about the request that was sent asynchronously (prior to the requection). Hence the mutex fact changed from (mutex (state LOCKED) (locked-by "other") (request NONE) to (mutex (state LOCKED) (locked-by "us") (request NONE) which is of course not wanted.
14:02:48.340853 CLIPS (executive): FIRE 25 mutex-trigger-event: f-1906,f-65681,*
D 14:02:48.340873 CLIPS (executive): Trigger: { "_id" : { "_data" : "8262459867000000252B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType" : "update", "clusterTime" : { "$timestamp" : { "t" : 1648728167, "i" : 37 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728167656 }, "locked-by" : "Set" }, "ns" : { "db" : "robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "locked" : false }, "removedFields" : [ "lock-time", "locked-by" ] } }
D 14:02:48.340906 CLIPS (executive): <== f-63573 (mutex (name resource-PROMISE-C-RS1-OUTPUT) (state LOCKED) (locked-by "Upsilan") (lock-time 1648728167 626000) (request NONE) (pending-requests) (response NONE) (auto-renew TRUE) (error-msg ""))
D 14:02:48.340913 CLIPS (executive): ==> f-65710 (mutex (name resource-PROMISE-C-RS1-OUTPUT) (state LOCKED) (locked-by "Set") (lock-time 1648728167 656000) (request NONE) (pending-requests) (response NONE) (auto-renew TRUE) (error-msg ""))
Hm at least the example log that you posted shows a different issue:
"updateDescription": {
"updatedFields": {
"locked": false
},
"removedFields": [
"lock-time",
"locked-by"
]
}
But at the same time:
{
"fullDocument": {
"_id": "resource-PROMISE-C-RS1-OUTPUT",
"locked": true,
"lock-time": {
"$date": 1648728167656
},
"locked-by": "Set"
}
}
So it seems like fullDocument
contains the document before the update (I think this can be configured when registering the trigger). May this be the actual issue?
That would make much more sense, i already wondered why the UpdateDescription differs from the doc... So far we always process a trigger by applying the data stored in the fullDocument.
(bind ?doc (bson-get ?obj "fullDocument"))
(bind ?id (sym-cat (bson-get ?doc "_id")))
(if ?id then
(bind ?locked FALSE)
(bind ?locked-by "")
(bind ?lock-time (create$ 0 0))
(bind ?locked (bson-get ?doc "locked"))
(if (bson-has-field ?doc "locked-by") then
(bind ?locked-by (bson-get ?doc "locked-by"))
)
(if (bson-has-field ?doc "lock-time") then
(bind ?lock-time (bson-get-time ?doc "lock-time"))
)
But did anything change or did we always only acted upon outdated data?
Also, looking into the log again, i find the trigger history for this resource a bit strange:
:g/Trigger.*resource-PROMISE-C-RS1-OUTPUT
D 14:02:05.021218 CLIPS (executive): Trigger: { "_id" : { "_data" : "826245983C000000152B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "insert", "clusterTime" : { "$timestamp" : { "t" : 1648728124, "i" : 21 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728124928 }, "locked-by" : "Icks" }, "ns" : { "db" :
"robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" } }
D 14:02:36.206661 CLIPS (executive): Trigger: { "_id" : { "_data" : "826245985C000000032B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "update", "clusterTime" : { "$timestamp" : { "t" : 1648728156, "i" : 3 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : false }, "ns" : { "db" : "robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_i
d" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "locked" : false }, "removedFields" : [ "lock-time", "locked-by" ] } }
D 14:02:47.679170 CLIPS (executive): Trigger: { "_id" : { "_data" : "8262459867000000172B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "update", "clusterTime" : { "$timestamp" : { "t" : 1648728167, "i" : 23 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728167626 }, "locked-by" : "Upsilan" }, "ns" : { "db
" : "robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "lock-time" : { "$date" : 1648728167626 }, "locked" : true, "locked-by" : "Upsilan" },
"removedFields" : [ ] } }
D 14:02:48.340873 CLIPS (executive): Trigger: { "_id" : { "_data" : "8262459867000000252B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "update", "clusterTime" : { "$timestamp" : { "t" : 1648728167, "i" : 37 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728167656 }, "locked-by" : "Set" }, "ns" : { "db" :
"robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "locked" : false }, "removedFields" : [ "lock-time", "locked-by" ] } }
D 14:02:48.341151 CLIPS (executive): Trigger: { "_id" : { "_data" : "82624598670000002B2B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "update", "clusterTime" : { "$timestamp" : { "t" : 1648728167, "i" : 43 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728167656 }, "locked-by" : "Set" }, "ns" : { "db" :
"robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "lock-time" : { "$date" : 1648728167656 }, "locked" : true, "locked-by" : "Set" }, "remove
dFields" : [ ] } }
In particular, i wonder why after a Trigger containing the full Document with locked-by Upsilan
and updateField locked-by Upsilan
we get the next Trigger containing a full Document with locked-by Set
and removed-Field locked-by
.
Did we miss a trigger here?
Looking at the doc of mongocxx change streams, the issue might be the following:
When set to ‘updateLookup’, the change stream will include both a delta describing the changes to the document, as well as a copy of the entire document that was changed from some time after the change occurred. This will be stored in the "fullDocument" field of the notification.
(emphasis mine).
So it may be that we get a trigger, but the document that is sent with the trigger already contains data from some subsequent change.