fawkes icon indicating copy to clipboard operation
fawkes copied to clipboard

clips-executive: release lock if we suddenly become the owner

Open TarikViehmann opened this issue 2 years ago • 4 comments

We encountered a case, where the local mutex fact was saying that another robot held the lock, hence the own mutex request was rejected. After the rejection went through we finally got the update about the request that was sent asynchronously (prior to the requection). Hence the mutex fact changed from (mutex (state LOCKED) (locked-by "other") (request NONE) to (mutex (state LOCKED) (locked-by "us") (request NONE) which is of course not wanted.

 14:02:48.340853 CLIPS (executive): FIRE   25 mutex-trigger-event: f-1906,f-65681,*
D 14:02:48.340873 CLIPS (executive): Trigger: { "_id" : { "_data" : "8262459867000000252B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType" : "update", "clusterTime" : { "$timestamp" : { "t" : 1648728167, "i" : 37 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728167656 }, "locked-by" : "Set" }, "ns" : { "db" : "robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "locked" : false }, "removedFields" : [ "lock-time", "locked-by" ] } }
D 14:02:48.340906 CLIPS (executive): <== f-63573 (mutex (name resource-PROMISE-C-RS1-OUTPUT) (state LOCKED) (locked-by "Upsilan") (lock-time 1648728167 626000) (request NONE) (pending-requests) (response NONE) (auto-renew TRUE) (error-msg ""))
D 14:02:48.340913 CLIPS (executive): ==> f-65710 (mutex (name resource-PROMISE-C-RS1-OUTPUT) (state LOCKED) (locked-by "Set") (lock-time 1648728167 656000) (request NONE) (pending-requests) (response NONE) (auto-renew TRUE) (error-msg ""))

TarikViehmann avatar Mar 31 '22 18:03 TarikViehmann

Hm at least the example log that you posted shows a different issue:

  "updateDescription": {
    "updatedFields": {
      "locked": false
    },
    "removedFields": [
      "lock-time",
      "locked-by"
    ]
  }

But at the same time:

{
  "fullDocument": {
    "_id": "resource-PROMISE-C-RS1-OUTPUT",
    "locked": true,
    "lock-time": {
      "$date": 1648728167656
    },
    "locked-by": "Set"
  }
}

So it seems like fullDocument contains the document before the update (I think this can be configured when registering the trigger). May this be the actual issue?

morxa avatar Apr 01 '22 06:04 morxa

That would make much more sense, i already wondered why the UpdateDescription differs from the doc... So far we always process a trigger by applying the data stored in the fullDocument.

  (bind ?doc (bson-get ?obj "fullDocument"))                                                                           
  (bind ?id (sym-cat (bson-get ?doc "_id")))                                                                           
  (if ?id then                                                                                                         
    (bind ?locked FALSE)                                                                                               
    (bind ?locked-by "")                                                                                               
    (bind ?lock-time (create$ 0 0))                                                                                    
                                                                                                                       
    (bind ?locked (bson-get ?doc "locked"))                                                                            
    (if (bson-has-field ?doc "locked-by") then                                                                         
      (bind ?locked-by (bson-get ?doc "locked-by"))                                                                    
    )                                                                                                                  
    (if (bson-has-field ?doc "lock-time") then                                                                         
      (bind ?lock-time (bson-get-time ?doc "lock-time"))                                                               
    ) 

But did anything change or did we always only acted upon outdated data?

TarikViehmann avatar Apr 01 '22 07:04 TarikViehmann

Also, looking into the log again, i find the trigger history for this resource a bit strange:

:g/Trigger.*resource-PROMISE-C-RS1-OUTPUT
D 14:02:05.021218 CLIPS (executive): Trigger: { "_id" : { "_data" : "826245983C000000152B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "insert", "clusterTime" : { "$timestamp" : { "t" : 1648728124, "i" : 21 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728124928 }, "locked-by" : "Icks" }, "ns" : { "db" :
 "robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" } }
D 14:02:36.206661 CLIPS (executive): Trigger: { "_id" : { "_data" : "826245985C000000032B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "update", "clusterTime" : { "$timestamp" : { "t" : 1648728156, "i" : 3 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : false }, "ns" : { "db" : "robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_i
d" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "locked" : false }, "removedFields" : [ "lock-time", "locked-by" ] } }
D 14:02:47.679170 CLIPS (executive): Trigger: { "_id" : { "_data" : "8262459867000000172B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "update", "clusterTime" : { "$timestamp" : { "t" : 1648728167, "i" : 23 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728167626 }, "locked-by" : "Upsilan" }, "ns" : { "db
" : "robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "lock-time" : { "$date" : 1648728167626 }, "locked" : true, "locked-by" : "Upsilan" },
 "removedFields" : [  ] } }
D 14:02:48.340873 CLIPS (executive): Trigger: { "_id" : { "_data" : "8262459867000000252B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "update", "clusterTime" : { "$timestamp" : { "t" : 1648728167, "i" : 37 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728167656 }, "locked-by" : "Set" }, "ns" : { "db" :
"robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "locked" : false }, "removedFields" : [ "lock-time", "locked-by" ] } }
D 14:02:48.341151 CLIPS (executive): Trigger: { "_id" : { "_data" : "82624598670000002B2B022C0100296E5A100490B566345E5A4E9EB31BF9DC848E6F9A463C5F6964003C7265736F757263652D50524F4D4953452D432D5253312D4F5554505554000004" }, "operationType"
: "update", "clusterTime" : { "$timestamp" : { "t" : 1648728167, "i" : 43 } }, "fullDocument" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT", "locked" : true, "lock-time" : { "$date" : 1648728167656 }, "locked-by" : "Set" }, "ns" : { "db" :
"robmem_coordination", "coll" : "mutex" }, "documentKey" : { "_id" : "resource-PROMISE-C-RS1-OUTPUT" }, "updateDescription" : { "updatedFields" : { "lock-time" : { "$date" : 1648728167656 }, "locked" : true, "locked-by" : "Set" }, "remove
dFields" : [  ] } }

In particular, i wonder why after a Trigger containing the full Document with locked-by Upsilan and updateField locked-by Upsilan we get the next Trigger containing a full Document with locked-by Set and removed-Field locked-by. Did we miss a trigger here?

TarikViehmann avatar Apr 01 '22 07:04 TarikViehmann

Looking at the doc of mongocxx change streams, the issue might be the following:

When set to ‘updateLookup’, the change stream will include both a delta describing the changes to the document, as well as a copy of the entire document that was changed from some time after the change occurred. This will be stored in the "fullDocument" field of the notification.

(emphasis mine).

So it may be that we get a trigger, but the document that is sent with the trigger already contains data from some subsequent change.

morxa avatar Apr 01 '22 07:04 morxa