autopilot: Timeout responding to 'init'
Description
The autopilot plugin consistently times out and terminates while lightningd waits for response to 'init'.
2021-01-19T20:36:12.455Z INFO plugin-autopilot.py: RPC method 'autopilot-run-once' does not have a docstring.
2021-01-19T20:36:15.926Z INFO plugin-bcli: bitcoin-cli initialized and connected to bitcoind.
2021-01-19T20:36:29.607Z INFO plugin-autopilot.py: No input specified download graph from peers
2021-01-19T20:36:29.607Z INFO plugin-autopilot.py: Instantiated networkx graph to store the lightning network
2021-01-19T20:36:29.607Z INFO plugin-autopilot.py: Attempt RPC-call to download nodes from the lightning network
2021-01-19T20:36:29.613Z INFO lightningd: --------------------------------------------------
2021-01-19T20:36:29.613Z INFO lightningd: Server started with public key xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, alias Xxxxxx (color #ffffff) and lightningd v0.9.2
2021-01-19T20:36:30.498Z INFO plugin-autopilot.py: peering with node: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
2021-01-19T20:37:33.477Z INFO plugin-autopilot.py: Killing plugin: failed to respond to 'init' in time, terminating.
Steps to reproduce
Trial and error with various values of autopilot-num-channels (including 0) does not change this behaviour, plugin still consistently terminates ~60 seconds after peering with first node.
Possible workaround
From experimenting I see that timeout does not occur if lightningd is built with a larger value for PLUGIN_MANIFEST_TIMEOUT in file lightningd/plugin.c. This is an observation rather than a solution or workaround.
Possible cause may be too low or zero funds available although setting autopilot-min-channel-size-msat to 0 does not solve issue either.
Update
From further investigation it would seem that this timeout may be a result of using lightningd over Tor. When the autopilot plugin initializes the following messages start to appear in the Tor log:
Jan 23 20:28:54 tor Tor[59]: Bootstrapped 100% (done): Done
Jan 23 20:39:42 tor Tor[59]: Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
Jan 23 20:40:22 tor Tor[59]: We tried for 15 seconds to connect to '[scrubbed]' using exit $30DBF7A773F4EE3BF6D1B7407DAECF40220AC8CD~relayon0337 at 185.220.101.205. Retrying on a new circuit.
Jan 23 20:40:37 tor Tor[59]: We tried for 15 seconds to connect to '[scrubbed]' using exit $45E9240AD4ECE01793A1977C1260503B2C2C861F~apx1 at 185.107.47.215. Retrying on a new circuit.
Jan 23 20:40:52 tor Tor[59]: We tried for 15 seconds to connect to '[scrubbed]' using exit $E8AD8C4FDC3FE152150C005BB2EAA6A0990B74DF~F3Netze at 185.220.100.243. Retrying on a new circuit.
Jan 23 20:41:08 tor Tor[59]: We tried for 15 seconds to connect to '[scrubbed]' using exit $98F793C7320CE3C15A45353AFCC165747A40366D~F3Netze at 185.220.100.255. Retrying on a new circuit.
Jan 23 20:41:22 tor Tor[59]: We tried for 15 seconds to connect to '[scrubbed]' using exit $25EEFFF278E1D9C4FCBED1B664B11ECB33532C76~Unnamed at 188.127.251.245. Retrying on a new circuit.
Jan 23 20:41:37 tor Tor[59]: We tried for 15 seconds to connect to '[scrubbed]' using exit $AFF2FC5C6F793B6E147EB93C1897D6DDA49E54FD~Wix at 95.211.230.211. Retrying on a new circuit.
Jan 23 20:41:52 tor Tor[59]: We tried for 15 seconds to connect to '[scrubbed]' using exit $C00C28C2B1A7D8038517626CECA9BCB23B0A31D2~relayon0224 at 185.220.101.129. Retrying on a new circuit.
Jan 23 20:41:52 tor Tor[59]: Tried for 125 seconds to get a connection to [scrubbed]:9735. Giving up.
.
.
.
This may indicate a more general error with lightning over Tor rather than the autopilot plugn.
Can confirm this is still an issue without TOR, so still probably an issue with the timeout.
Most likely it relates to line 89 in autopilot.py, as the attempt to get the channel list is so large that it just times out before completion.
Maybe there would be a way to run this after the plugin was "initialized" by lightningd, so it doesn't time out
Testing currently, but it looks like it will work. Made a separate plugin command to initialize the autopilot class after the main initialization. Main init just grabs option values, which still functions perfectly.
Running the autopilot-run-once command as a test, but it is promising thus far.
Created PR #284 for fix
@aido did #284 fix it?