python-teos icon indicating copy to clipboard operation
python-teos copied to clipboard

c-lightning crashed, since then loading watchtower crashes c-lightning instantly

Open svewa opened this issue 2 years ago • 2 comments

So I was running for a few days when CL was OOM killed. Since then starting CL with watchtower plugin or loading watchtower plugin after CL successfully started, CL crashes with:

022-03-05T01:49:20.689Z INFO    plugin-backup.py: root        : Comparing backup version 4771214 versus first write version 4771215
2022-03-05T01:49:20.689Z INFO    plugin-backup.py: Comparing backup version 4771214 versus first write version 4771215
2022-03-05T01:49:20.689Z INFO    plugin-backup.py: root        : Versions match up
2022-03-05T01:49:20.689Z INFO    plugin-backup.py: Versions match up
2022-03-05T01:49:31.236Z INFO    plugin-bcli: bitcoin-cli initialized and connected to bitcoind.
2022-03-05T01:49:35.749Z INFO    plugin-autoclean: autocleaning every 600 seconds
2022-03-05T01:49:35.769Z INFO    plugin-watchtower.py: A key file for the watchtower client already exists. Loading it
2022-03-05T01:49:35.770Z INFO    lightningd: --------------------------------------------------
2022-03-05T01:49:35.770Z INFO    lightningd: Server started with public key xxx, alias xxx (color #xxx) and lightningd v0.10.2
2022-03-05T01:49:35.829Z INFO    plugin-watchtower.py: Plugin watchtower client initialized. User id = xxx
2022-03-05T01:49:35.930Z INFO    plugin-watchtower.py: Cannot load towers db. Resource temporarily unavailable
2022-03-05T01:49:35.937Z INFO    plugin-watchtower.py: Traceback (most recent call last):
2022-03-05T01:49:35.937Z INFO    plugin-watchtower.py:   File \"/home/cl/src/python-teos/watchtower-plugin/watchtower.py\", line 166, in init
2022-03-05T01:49:35.937Z INFO    plugin-watchtower.py:     plugin.wt_client = WTClient(user_sk, user_id, conf)
2022-03-05T01:49:35.937Z INFO    plugin-watchtower.py:   File \"/home/cl/src/python-teos/watchtower-plugin/watchtower.py\", line 90, in __init__
2022-03-05T01:49:35.937Z INFO    plugin-watchtower.py:     self.db_manager = TowersDBM(config.get(\"TOWERS_DB\"), plugin)
2022-03-05T01:49:35.937Z INFO    plugin-watchtower.py:   File \"/home/cl/src/python-teos/watchtower-plugin/towers_dbm.py\", line 25, in __init__
2022-03-05T01:49:35.937Z INFO    plugin-watchtower.py:     super().__init__(db_path)
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py:   File \"/home/cl/.local/lib/python3.9/site-packages/common/db_manager.py\", line 23, in __init__
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py:     self.db = plyvel.DB(db_path, create_if_missing=True)
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py:   File \"plyvel/_plyvel.pyx\", line 247, in plyvel._plyvel.DB.__init__
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py:   File \"plyvel/_plyvel.pyx\", line 88, in plyvel._plyvel.raise_for_status
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py: plyvel._plyvel.IOError: b'IO error: lock /home/cl/.watchtower/towers/LOCK: Resource temporarily unavailable'
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py: 
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py: During handling of the above exception, another exception occurred:
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py: 
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py: Traceback (most recent call last):
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py:   File \"/home/cl/.local/lib/python3.9/site-packages/pyln/client/plugin.py\", line 621, in _dispatch_request
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py:     result = self._exec_func(method.func, request)
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py:   File \"/home/cl/.local/lib/python3.9/site-packages/pyln/client/plugin.py\", line 606, in _exec_func
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py:     return func(*ba.args, **ba.kwargs)
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py:   File \"/home/cl/.local/lib/python3.9/site-packages/pyln/client/plugin.py\", line 845, in _init
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py:     return self._exec_func(self.child_init, request)
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py:   File \"/home/cl/.local/lib/python3.9/site-packages/pyln/client/plugin.py\", line 606, in _exec_func
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py:     return func(*ba.args, **ba.kwargs)
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py:   File \"/home/cl/src/python-teos/watchtower-plugin/watchtower.py\", line 170, in init
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py:     raise IOError(error)
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py: OSError: Cannot load towers db. Resource temporarily unavailable
2022-03-05T01:49:35.938Z INFO    plugin-watchtower.py: 
2022-03-05T01:49:37.935Z INFO    xxx-channeld-chan#22075: Peer connection lost
2022-03-05T01:49:37.935Z UNUSUAL xxx-channeld-chan#22075: Status closed, but waitpid 1076676 says No child processes
2022-03-05T01:49:37.936Z INFO    xxx-chan#22075: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (-1)
2022-03-05T01:49:38.468Z INFO    xxx-chan#22075: Peer transient failure in CHANNELD_NORMAL: Reconnected
2022-03-05T01:49:40.692Z **BROKEN** lightningd: Plugin for commitment_revocation returned non-result response {"jsonrpc": "2.0", "id": 145, "error": {"code": -32600, "message": "Error while processing commitment_revocation: 'Plugin' object has no attribute 'wt_client'", "traceback": "Traceback (most recent call last):\n  File \"/home/cl/.local/lib/python3.9/site-packages/pyln/client/plugin.py\", line 621, in _dispatch_request\n    result = self._exec_func(method.func, request)\n  File \"/home/cl/.local/lib/python3.9/site-packages/pyln/client/plugin.py\", line 606, in _exec_func\n    return func(*ba.args, **ba.kwargs)\n  File \"/home/cl/src/python-teos/watchtower-plugin/watchtower.py\", line 392, in on_commitment_revocation\n    signature = Cryptographer.sign(appointment.serialize(), plugin.wt_client.sk)\nAttributeError: 'Plugin' object has no attribute 'wt_client'\n"}}
2022-03-05T01:49:40.741Z **BROKEN** lightningd: FATAL SIGNAL 6 (version v0.10.2)
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: common/daemon.c:38 (send_backtrace) 0x5563a03b5da1
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: common/daemon.c:46 (crashdump) 0x5563a03b5df0
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7f11442cdd5f
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7f11442cdce1
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7f11442b7536
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: lightningd/log.c:804 (fatal) 0x5563a0359547
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: lightningd/plugin_hook.c:179 (plugin_hook_callback) 0x5563a03847fe
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: lightningd/plugin.c:521 (plugin_response_handle) 0x5563a037ea9b
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: lightningd/plugin.c:627 (plugin_read_json_one) 0x5563a037ecc0
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: lightningd/plugin.c:672 (plugin_read_json) 0x5563a037eea0
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x5563a041a90d
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x5563a041b4b7
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x5563a041b4f5
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: ccan/ccan/io/poll.c:453 (io_loop) 0x5563a041d6cb
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: lightningd/io_loop_with_timers.c:21 (io_loop_with_timers) 0x5563a03509df
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: lightningd/lightningd.c:1163 (main) 0x5563a03568d7
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7f11442b8d09
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x5563a032c399
2022-03-05T01:49:40.741Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0xffffffffffffffff

the LOCK file could be cat'ed, removing it resolved the issue.

svewa avatar Mar 05 '22 02:03 svewa

Looks like when your node crashed the LOCK could not be cleared and it was preventing the tower to load again. What is pretty weird is that the plugin failing to load was killing the node.

I'm pretty certain it used to be the case that if an exception like that was raised, the plugin was killed but the node will run without it. Have you setup the plugin to be important?

--important-plugin <arg>.   Add an important plugin to be run (can be used multiple times). 
                            Die if the plugin dies.

sr-gi avatar Mar 05 '22 07:03 sr-gi

No, it's not important, just "plugin".

svewa avatar Mar 05 '22 13:03 svewa