namada icon indicating copy to clipboard operation
namada copied to clipboard

display actionable error message when sync does not start

Open egasimus opened this issue 1 year ago • 1 comments

Recently, we began to frequently encounter cases where our local "resync" (pseudo-archival) nodes successfully credit initial balances, and then do not begin to sync from the provided persistent_peers.

INFO namada_node::shell::init_chain: Crediting X nam tokens to Y
INFO namada_node::shell::init_chain: Crediting A nam tokens to Z
# ... repeat for thousands of lines ...
# and then crickets 🦗🦗🦗
  • Sometimes this is due to misconfiguring the persistent peers, e.g. wrong hostname, wrong port, wrong docker networking config...
  • Sometimes might possibly be due to the peers only allowing 1 connection per source IP (bad if you want to do e.g. blue/green deploy from same IP).
  • Sometimes, confusingly, you wait for 2 minutes and it does begin to sync.
  • Sometimes, even more confusingly, everything is fine, except that the node has stopped retrying, and the sync only begins after a manual restart of the node.

These have different root causes, yet in all cases there is zero feedback from namadan as to what is wrong. This makes it difficult to determine and take the appropriate next step in a timely manner, which puts unreasonable strain on our DevOps resources.

It would be immensely helpful if the state of "failure to begin sync" resulted in an explanatory error message being emitted at INFO, WARNING, or ERROR level.

Looking at the way run_aux launches multiple sub-tasks on an asynchronous basis, I'd venture a guess that it will also be necessary to repeat that message periodically, so that it doesn't get lost in the scrollback from the crediting tokens messages.

egasimus avatar Nov 18 '24 10:11 egasimus

All network code is handled by CometBFT. Namada hides its output by default, but you can export NAMADA_CMT_STDOUT=true and CMT_LOG_LEVEL=info or CMT_LOG_LEVEL=debug to see what's going on at the P2P level. Be warned that setting CometBFT's log level to debug generates incredibly noisy output.

sug0 avatar Nov 18 '24 21:11 sug0