zebra
zebra copied to clipboard
Zebra falsely activates the mempool when the connection goes down during a sync
Motivation
Zebra currently estimates that it is close to the tip if the average number of synced blocks in the last four batches drops below 20. The decision is made in the function is_close_to_tip
here https://github.com/ZcashFoundation/zebra/blob/update-column-family-names/zebrad/src/components/sync/status.rs#L67. Zebra uses the output of this function to activate the mempool.
However, when Zebra loses its peers during the syncing process, is_close_to_tip
returns a false positive because the number of synced blocks in the last four batches naturally drops below 20. This happens, for example, when the internet connection goes down on the machine where Zebra is running.
Solution
Make sure Zebra has enough connections before is_close_to_tip
returns the final decision.
Steps
- Move
WatchReceiver
tozebra-chain
so that you can use it from bothzebra-state
andzebra-network
- Create a new channel in https://github.com/zcashfoundation/zebra/blob/75a679792bb787b95b4e9ce87aaefe205c48c97a/zebra-network/src/peer_set/initialize.rs#L159, store the sender in
PeerSet
, and return the receiver. - Use the sender in
update_metrics
inPeerSet
. - Store the receiver in
SyncStatus
asWatchReceiver
, and use it inis_close_to_tip
. - Test that Zebra does not activate the mempool when it has no peers.
Make sure Zebra has enough connections before
is_close_to_tip
returns the final decision.
What happens if we're using testnet in a box, or we only have a small number of working peers, but we are actually near the tip?
Here are some possible solutions:
- use a different minimum for testnet, because it only usually has 5-8 peers in total
- check that we're near the previous maximum number of peers (but peers can go down)
- check that we're near the estimated tip, based on our local clock (but clocks can be wrong)
What happens if we're using testnet in a box, or we only have a small number of working peers, but we are actually near the tip?
I think checking if Zebra has at least one or two peers before returning a true
from is_close_to_tip
would be sufficient.
What happens if we're using testnet in a box, or we only have a small number of working peers, but we are actually near the tip?
I think checking if Zebra has at least one or two peers before returning a
true
fromis_close_to_tip
would be sufficient.
Ok, sounds good, I would suggest just one peer (that's a common "testnet in a box" setup).
We're making network issues a lower priority for now.
This might add unwanted load to other nodes, so it's worth fixing.
Marek and I had a chat about this ticket. It's not a release blocker, so we're going to update the ticket with a design and a draft branch, and move it out of this sprint.
We can fix it if it ever becomes a problem for users.
I updated the solution in the PR description with implementation details.
We're not making changes this big while the audit is pending
This is actually happening and causing CI failures:
assertion failed: output.stdout_line_contains("activating mempool").is_err() https://github.com/ZcashFoundation/zebra/actions/runs/3237875279/jobs/5305448541#step:15:520
This might have caused PR #5596 to fail in the merge queue:
Message: assertion failed: output.stdout_line_contains("activating mempool").is_err()
https://github.com/ZcashFoundation/zebra/actions/runs/3430430965/jobs/5717404168#step:3:226
This doesn't seem to be causing any issues for users?
Can re-open if needed in future