plugins icon indicating copy to clipboard operation
plugins copied to clipboard

autopilot: Timeout responding to 'init'

Open aido opened this issue 4 years ago • 4 comments

Description

The autopilot plugin consistently times out and terminates while lightningd waits for response to 'init'.

2021-01-19T20:36:12.455Z INFO    plugin-autopilot.py: RPC method 'autopilot-run-once' does not have a docstring.
2021-01-19T20:36:15.926Z INFO    plugin-bcli: bitcoin-cli initialized and connected to bitcoind.
2021-01-19T20:36:29.607Z INFO    plugin-autopilot.py: No input specified download graph from peers
2021-01-19T20:36:29.607Z INFO    plugin-autopilot.py: Instantiated networkx graph to store the lightning network
2021-01-19T20:36:29.607Z INFO    plugin-autopilot.py: Attempt RPC-call to download nodes from the lightning network
2021-01-19T20:36:29.613Z INFO    lightningd: --------------------------------------------------
2021-01-19T20:36:29.613Z INFO    lightningd: Server started with public key xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, alias Xxxxxx (color #ffffff) and lightningd v0.9.2
2021-01-19T20:36:30.498Z INFO    plugin-autopilot.py: peering with node: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
2021-01-19T20:37:33.477Z INFO    plugin-autopilot.py: Killing plugin: failed to respond to 'init' in time, terminating.

Steps to reproduce

Trial and error with various values of autopilot-num-channels (including 0) does not change this behaviour, plugin still consistently terminates ~60 seconds after peering with first node.

Possible workaround

From experimenting I see that timeout does not occur if lightningd is built with a larger value for PLUGIN_MANIFEST_TIMEOUT in file lightningd/plugin.c. This is an observation rather than a solution or workaround. Possible cause may be too low or zero funds available although setting autopilot-min-channel-size-msat to 0 does not solve issue either.

Update

From further investigation it would seem that this timeout may be a result of using lightningd over Tor. When the autopilot plugin initializes the following messages start to appear in the Tor log:

Jan 23 20:28:54 tor Tor[59]: Bootstrapped 100% (done): Done
Jan 23 20:39:42 tor Tor[59]: Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
Jan 23 20:40:22 tor Tor[59]: We tried for 15 seconds to connect to '[scrubbed]' using exit $30DBF7A773F4EE3BF6D1B7407DAECF40220AC8CD~relayon0337 at 185.220.101.205. Retrying on a new circuit.
Jan 23 20:40:37 tor Tor[59]: We tried for 15 seconds to connect to '[scrubbed]' using exit $45E9240AD4ECE01793A1977C1260503B2C2C861F~apx1 at 185.107.47.215. Retrying on a new circuit.
Jan 23 20:40:52 tor Tor[59]: We tried for 15 seconds to connect to '[scrubbed]' using exit $E8AD8C4FDC3FE152150C005BB2EAA6A0990B74DF~F3Netze at 185.220.100.243. Retrying on a new circuit.
Jan 23 20:41:08 tor Tor[59]: We tried for 15 seconds to connect to '[scrubbed]' using exit $98F793C7320CE3C15A45353AFCC165747A40366D~F3Netze at 185.220.100.255. Retrying on a new circuit.
Jan 23 20:41:22 tor Tor[59]: We tried for 15 seconds to connect to '[scrubbed]' using exit $25EEFFF278E1D9C4FCBED1B664B11ECB33532C76~Unnamed at 188.127.251.245. Retrying on a new circuit.
Jan 23 20:41:37 tor Tor[59]: We tried for 15 seconds to connect to '[scrubbed]' using exit $AFF2FC5C6F793B6E147EB93C1897D6DDA49E54FD~Wix at 95.211.230.211. Retrying on a new circuit.
Jan 23 20:41:52 tor Tor[59]: We tried for 15 seconds to connect to '[scrubbed]' using exit $C00C28C2B1A7D8038517626CECA9BCB23B0A31D2~relayon0224 at 185.220.101.129. Retrying on a new circuit.
Jan 23 20:41:52 tor Tor[59]: Tried for 125 seconds to get a connection to [scrubbed]:9735. Giving up.
.
.
.

This may indicate a more general error with lightning over Tor rather than the autopilot plugn.

aido avatar Jan 19 '21 23:01 aido

Can confirm this is still an issue without TOR, so still probably an issue with the timeout.

Most likely it relates to line 89 in autopilot.py, as the attempt to get the channel list is so large that it just times out before completion.

Maybe there would be a way to run this after the plugin was "initialized" by lightningd, so it doesn't time out

bit0fun avatar Aug 04 '21 16:08 bit0fun

Testing currently, but it looks like it will work. Made a separate plugin command to initialize the autopilot class after the main initialization. Main init just grabs option values, which still functions perfectly.

Running the autopilot-run-once command as a test, but it is promising thus far.

bit0fun avatar Aug 04 '21 16:08 bit0fun

Created PR #284 for fix

bit0fun avatar Aug 05 '21 13:08 bit0fun

@aido did #284 fix it?

PestToast avatar Feb 08 '22 21:02 PestToast