diego-release icon indicating copy to clipboard operation
diego-release copied to clipboard

Document expected locket related logs

Open ameowlia opened this issue 1 year ago • 2 comments

I am currently debugging an issue around locket and I was trying to figure out which logs are part of normal operation. It would great to document this better in the future. For now I am going to document it here.

Setup for testing "lock" locks

  • I have two VMs where BBS is deployed
    • control/9e859b27-ac41-4e39-ba59-bb87d520d987 👈 This VM started first. So the bbs on it has the lock.
    • control/fe89652e-4e59-4efc-b75c-88cc029ed816 👈 This VM started second. So the bbs on it does not have the lock.
# control/9e859b27-ac41-4e39-ba59-bb87d520d987
# bbs.stdout.log
# this bbs has the lock, so it is active. it is constantly logging about the work it is doing.
...
{"timestamp":"2023-06-22T19:13:05.362719089Z","level":"info","source":"bbs","message":"bbs.converge-lrps.db-converge-lrps.complete","data":{"session":"1875.2"}}
{"timestamp":"2023-06-22T19:13:05.366102972Z","level":"info","source":"bbs","message":"bbs.executing-convergence.converge-lrps-done","data":{"session":"1874"}}
...
# control/fe89652e-4e59-4efc-b75c-88cc029ed816
# bbs.stdout.log
# this bbs does not have the lock, so it is not active. this last log line is from when it attempted to grab the lock, but failed because a different bbs held the lock.
# at this point it is not actively logging because it is not doing any work.
# note that it tries to make a lock with owner 'fe89652e-4e59-4efc-b75c-88cc029ed816' (this VM), 
# but the lock already exists with owner '9e859b27-ac41-4e39-ba59-bb87d520d987' (the other VM with the bbs that has the lock).

{"timestamp":"2023-06-22T18:01:32.238648111Z","level":"error","source":"bbs","message":"bbs.locket-lock.failed-to-acquire-lock",
"data":{"error":"rpc error: code = AlreadyExists desc = lock-collision","lock":{"key":"bbs","owner":"fe89652e-4e59-4efc-b75c-88cc029ed816","type":"lock","type_code":1},
"lock-owner":"9e859b27-ac41-4e39-ba59-bb87d520d987","request-uuid":"7ca981d9-9375-45a8-7ddc-8940dc662643","session":"4","ttl_in_seconds":15}}
# locket.stdout.log
# There is a matching log in locket from where the bbs on control/fe89652e-4e59-4efc-b75c-88cc029ed81 attempted to grab the lock and failed
{"timestamp":"2023-06-22T18:01:32.238648111Z","level":"error","source":"bbs","message":"bbs.locket-lock.failed-to-acquire-lock",
"data":{"error":"rpc error: code = AlreadyExists desc = lock-collision","lock":{"key":"bbs","owner":"fe89652e-4e59-4efc-b75c-88cc029ed816","type":"lock","type_code":1},
"lock-owner":"9e859b27-ac41-4e39-ba59-bb87d520d987","request-uuid":"7ca981d9-9375-45a8-7ddc-8940dc662643","session":"4","ttl_in_seconds":15}}
# In the locket.lock db table there is one lock for bbs. And the owner is the VM guid of the VM where the bbs that has the lock is running.
# The 'modified_index' is constantly being sequentially updated.

mysql> select * from locket.locks where path = "bbs";
+------+--------------------------------------+-------+------+----------------+--------------------------------------+------+
| path | owner                                | value | type | modified_index | modified_id                          | ttl  |
+------+--------------------------------------+-------+------+----------------+--------------------------------------+------+
| bbs  | 9e859b27-ac41-4e39-ba59-bb87d520d987 |       | lock |           4352 | d6d8b4f2-8d1e-4ba0-71ca-fdbf6b0bad58 |   15 |
+------+--------------------------------------+-------+------+----------------+--------------------------------------+------+

Changing who has locks

If the bbs that currently has the lock restarts, then it will give up the lock and the other bbs will claim the lock. I did this by running monit restart bbs on control/9e859b27-ac41-4e39-ba59-bb87d520d987.

# locket.stdout.log
# The bbs on VM '9e859b27-ac41-4e39-ba59-bb87d520d987' gives up the lock
{"timestamp":"2023-06-22T19:16:05.620930708Z","level":"info","source":"locket","message":"locket.release.release-lock.released-lock","data":{"key":"bbs","owner":"9e859b27-ac41-4e39-ba59-bb87d520d987","session":"1774728.1","type":"lock","type-code":1}}

# the bbs on VM 'fe89652e-4e59-4efc-b75c-88cc029ed816' successfully claims the lock.
{"timestamp":"2023-06-22T19:16:06.089322785Z","level":"info","source":"locket","message":"locket.lock.lock.acquired-lock","data":{"key":"bbs","owner":"fe89652e-4e59-4efc-b75c-88cc029ed816","request-uuid":"7e2a16d0-56a7-40dc-5d04-ca7275364cc2","session":"1774735.1","type":"lock","type-code":1}}

# the bbs on VM '9e859b27-ac41-4e39-ba59-bb87d520d987' is restarted and attempts to claim the lock
{"timestamp":"2023-06-22T19:16:20.492246371Z","level":"info","source":"locket","message":"locket.lock.register-ttl.fetch-and-release-lock.fetched-lock","data":{"key":"bbs","modified-index":4431,"owner":"9e859b27-ac41-4e39-ba59-bb87d520d987","request-uuid":"05c37d7c-4163-4b64-4d9f-30af73a0860d","session":"1774724.2.1","type":"lock","type-code":1}}

# the bbs on VM '9e859b27-ac41-4e39-ba59-bb87d520d987' fails to claim the lock, because the bbs on VM 'fe89652e-4e59-4efc-b75c-88cc029ed816' has the lock
{"timestamp":"2023-06-22T19:16:20.493343559Z","level":"error","source":"locket","message":"locket.lock.register-ttl.fetch-and-release-lock.fetch-failed-owner-mismatch","data":{"error":"rpc error: code = AlreadyExists desc = lock-collision","fetched-owner":"fe89652e-4e59-4efc-b75c-88cc029ed816","key":"bbs","modified-index":4431,"owner":"9e859b27-ac41-4e39-ba59-bb87d520d987","request-uuid":"05c37d7c-4163-4b64-4d9f-30af73a0860d","session":"1774724.2.1","type":"lock","type-code":1}}
{"timestamp":"2023-06-22T19:16:20.495265778Z","level":"error","source":"locket","message":"locket.lock.register-ttl.failed-compare-and-release","data":{"error":"rpc error: code = AlreadyExists desc = lock-collision","key":"bbs","modified-index":4431,"request-uuid":"05c37d7c-4163-4b64-4d9f-30af73a0860d","session":"1774724.2","type":"lock"}}
# the lock in the db is now updated with the new owner.

mysql> select * from locket.locks where path = "bbs";
+------+--------------------------------------+-------+------+----------------+--------------------------------------+------+
| path | owner                                | value | type | modified_index | modified_id                          | ttl  |
+------+--------------------------------------+-------+------+----------------+--------------------------------------+------+
| bbs  | fe89652e-4e59-4efc-b75c-88cc029ed816 |       | lock |             98 | cba003ea-608f-4cdf-539e-0304896dc313 |   15 |
+------+--------------------------------------+-------+------+----------------+--------------------------------------+------+
1 row in set (0.00 sec)
# control/9e859b27-ac41-4e39-ba59-bb87d520d987
# bbs.stdout.log
# this bbs has successfully restarted.
{"timestamp":"2023-06-22T19:16:07.128143147Z","level":"info","source":"bbs","message":"bbs.locket-lock.started","data":{"lock":{"key":"bbs","owner":"9e859b27-ac41-4e39-ba59-bb87d520d987","type":"lock","type_code":1},"session":"4","ttl_in_seconds":15}}

# this bbs fails to claim the lock because the other bbs has the lock
{"timestamp":"2023-06-22T19:16:07.136452192Z","level":"error","source":"bbs","message":"bbs.locket-lock.failed-to-acquire-lock","data":{"error":"rpc error: code = AlreadyExists desc = lock-collision","lock":{"key":"bbs","owner":"9e859b27-ac41-4e39-ba59-bb87d520d987","type":"lock","type_code":1},"lock-owner":"fe89652e-4e59-4efc-b75c-88cc029ed816","request-uuid":"efb3d6c6-9585-4efb-6f74-126dcfccb99c","session":"4","ttl_in_seconds":15}}
# control/fe89652e-4e59-4efc-b75c-88cc029ed816
# bbs.stdout.log

# this bbs grabs the lock as soon as it is up for grabs
{"timestamp":"2023-06-22T19:16:06.091053490Z","level":"info","source":"bbs","message":"bbs.locket-lock.acquired-lock","data":{"lock":{"key":"bbs","owner":"fe89652e-4e59-4efc-b75c-88cc029ed816","type":"lock","type_code":1},"session":"4",
"ttl_in_seconds":15}}

# once it grabs the lock it gets to work!
{"timestamp":"2023-06-22T19:16:06.092710053Z","level":"info","source":"bbs","message":"bbs.set-lock-held-metron-notifier.started","data":{"session":"5"}}
{"timestamp":"2023-06-22T19:16:06.093315811Z","level":"info","source":"bbs","message":"bbs.task-completion-workpool.starting","data":{"session":"1"}}
...

ameowlia avatar Jun 22 '23 19:06 ameowlia