failed to recv on P2P conversation: VerifyingError("Invalid message signature")
When updating the Hiro seed nodes, we'll see our stacks nodes which are using them to bootstrap rapidly emit thousands of log lines a minute consisting of nothing but the following:
stacks-blockchain {"msg":"convo:id=53,outbound=true,peer=UNKNOWN+UNKNOWN://34.150.184.50:20444: failed to recv on P2P conversation: VerifyingError(\"Invalid message signature\")","level":"INFO","ts":"2024-01-30T15:24:35.220519856Z","thread":"p2p-(0.0.0.0:20444,0.0.0.0:20443)","line":1933,"file":"src/net/chat.rs"}
After restarting the affected stacks nodes, the problem persists. However, after waiting 15 minutes and restarting the seed nodes again, restarting the other stacks nodes again then seems to fix the problem. So to clarify, the resolution workflow looks like this:
- Update seed node network config (e.g. external IP address change)
- Restart seed nodes, wait for them to come up
- Restart other stacks nodes, wait for them to come up. See problem persists.
- Restart seed nodes again, wait for the to come up
- Restart other stacks nodes, wait for them to come up. Problem is resolved.
I'm not sure what could be causing this. All stacks nodes behind the address in the log above are seed nodes, and they all have the same public key. This makes updating seed nodes feel fragile, but perhaps there's a reason to this that I'm not aware of.
So my questions are:
- If a seed node's external IP changes, do other nodes using it notice it through DNS resolution update, and thus take action on it? Or is DNS resolved only once, at bootup?
- Given that all nodes behind the address in the log are running the same public key, what could be causing this
VerifyingError? I've used the 5 steps above to fix this situation multiple times now, so it seems like something solvable given how easily I've been able to reproduce and fix it.