forest
forest copied to clipboard
SyncSubmitBlock
The block submission to the swarm is done.
The sanity checks are not, as this requires introducing some extra abstractions and infra.
Hi!
Unfortunately, seems like the chain head still doesn't get updated. I'm not sure why. I've set tracing to debug (info for the most noise-producing stuff) and I'm attaching a log. Hopefully it's at least a little bit helpful.
The commit hash from which Forest was built for testing is 3a1291d4655fb641a85d25dd288cd3f19d5c2c2d .
The testing was performed on a local 2K devnet, with Forest bootstrapped and synced via a Lotus node.
Tested using custom miner that is tested against Lotus and works on the same devnet on the Lotus node.
@alt-ivan how do I reproduce this? I’d love to see the steps both for Lotus and Forest to find the culprit. Thank you in advance!
@ruseinov
Hi Roman,
I don't currently have a great method of reproducing to share. I tried quickly jury-rigging lotus-miner to work with Forest as means of being able to create a repro but I haven't had much success yet.
Here's the RPC sequence used by the custom miner:
- ChainHead
- MinerGetBaseInfo -> Beacon entries & is eligible for mining
- WalletSign -> VRFProof for computing election proof
- MinerGetBaseInfo -> Sector info for winning PoST
- WalletSign -> Signing computed block ticket hash
- MpoolSelect -> Messages to include in block
- MinerCreateBlock
- SyncSubmitBlock -> With result from MinerCreateBlock
- ChainHead
The only change I made is to hard-code block_delay_secs from StateGetNetworkParams as that RPC isn't implemented in Forest yet. I used the same value returned by the Lotus node.
I have also manually run each of these against both Lotus/Forest synced to the same tipset/head and all RPCs listed here return identical responses on both Forest and Lotus nodes for the same inputs.
If you would find it helpful, I can share the request/response for each of these as well.
Edit:
- MinerCreateBlock
- SyncSubmitBlock -> With result from MinerCreateBlock
These should be enough for manual testing if you have Forest/Lotus synced to the same head. I suppose you could at that point take MinerCreateBlock result from lotus-miner and manually feed it into both nodes' SyncSubmitBlock RPCs
@alt-ivan Thanks for lining that up for me, it's helpful. The requests and responses would be nice to see as well if possible.
With the sequence above - how can we be sure that the head is never set?
Bear in mind that the way this works in forest is that a block is sent to the message queue for processing, so there are no guarantees as to exactly when it actually gets processed, though it should be quite fast. Have you tried verifying that the given block never makes it into the chain?
3a1291d4655fb641a85d25dd288cd3f19d5c2c2d
This does not seem to exist anymore, could you try the latest main? Though as long as it had my latest SyncSubmitBlock change - it should not matter.
Edit: I have double checked the files there and it seems correct.
@ruseinov
The requests and responses would be nice to see as well if possible.
Seems like I didn't save them after all 🤦 ... I'll redo the sequence and send the requests/responses. Sorry for the delay.
With the sequence above - how can we be sure that the head is never set?
Calling ChainHead returns the same tipset as before the SyncSubmitBlock call. Also, SyncSubmitBlock can be called over and over again with the same block and it will return a successful result, but the blocks seems to get "eaten" somewhere.
Bear in mind that the way this works in forest is that a block is sent to the message queue for processing, so there are no guarantees as to exactly when it actually gets processed, though it should be quite fast. Have you tried verifying that the given block never makes it into the chain?
I figured as much when looking at the codebase. I only stepped through the SyncSubmitBlock impl and there I saw that the created tipset doesn't get a CID assigned to it at that point in time. After that I didn't know where to follow the propagation of the block to see if it gets recorded anywhere using GetChainBlock, because I couldn't find its CID if it did.
Seems like I didn't save them after all 🤦 ... I'll redo the sequence and send the requests/responses. Sorry for the delay.
Please do!
I figured as much when looking at the codebase. I only stepped through the
SyncSubmitBlockimpl and there I saw that the created tipset doesn't get a CID assigned to it at that point in time. After that I didn't know where to follow the propagation of the block to see if it gets recorded anywhere usingGetChainBlock, because I couldn't find its CID if it did.
That is helpful, let me check..
I figured as much when looking at the codebase. I only stepped through the
SyncSubmitBlockimpl and there I saw that the created tipset doesn't get a CID assigned to it at that point in time. After that I didn't know where to follow the propagation of the block to see if it gets recorded anywhere usingGetChainBlock, because I couldn't find its CID if it did.That is helpful, let me check..
@alt-ivan in fact key is lazily initialized on fn key() call.
https://github.com/ChainSafe/forest/blob/4a6ad5c8f4c90ed0dcf13f3069540545fad0a8f4/src/blocks/tipset.rs#L301-L304
So you can basically check the CIDs of headers to see which ones to monitor.
For further debugging:
- Here's where tipset channel is being consumed: https://github.com/ChainSafe/forest/blob/32ae416f8dbd34a2e0eadf0ffbbeb9846e41e380/src/chain_sync/tipset_syncer.rs#L348
- Here's where the rest of the logic happens: https://github.com/ChainSafe/forest/blob/32ae416f8dbd34a2e0eadf0ffbbeb9846e41e380/src/chain_sync/tipset_syncer.rs#L375-L573
@ruseinov
I tried to adapt lotus-miner so I could give you easy steps to reproduce (to stop at MinerCreateBlock so you can feed the output of that into SyncSubmitBlock) but I ran into so many speedbumps that I gave up for now :/
Some new interesting info:
So you can basically check the CIDs of headers to see which ones to monitor.
When I connect Forest/Lotus and execute SyncSubmitBlock on Forest, it propagates that block to Lotus and Lotus now has the correct chain head with the latest tipset. Forest also receives this block, seemingly back from Lotus via gossip, because I can query it via GetChainBlock and see it. However, Forest's ChainHead still points to the previous epoch tipset.
When I don't connect Forest/Lotus and execute SyncSubmitBlock on Forest, then that block doesn't get stored and I cannot query it via GetChainBlock
Here's where tipset channel is being consumed:
I added tracing into these locations and nothing happens. Doing some more eye-level examination I saw this:
fn follow(&self, tipset_opt: Option<FullTipset>) -> ChainMuxerFuture<(), ChainMuxerError> {
// Instantiate a TipsetProcessor
But in devnet it seems that Forest never enters follow mode https://github.com/ChainSafe/forest/issues/3089
From what I saw, tipset_receiver only gets consumed here via TipsetProcessor? I might be wrong though.
This may also explain other sync issues with Lotus in devnet i.e. Forest often doesn't sync fully with Lotus with similar "symptoms" (ChainHead returns earlier tipset but later one exists in store)
@alt-ivan Thanks a lot for looking into this, no it starts to make a lot of sense!
When I don't connect Forest/Lotus and execute
SyncSubmitBlockon Forest, then that block doesn't get stored and I cannot query it viaGetChainBlock
That's probably due to the fact that the actual network send works, but the TipsetProcessor part does not work.
But in devnet it seems that Forest never enters follow mode #3089
You are correct, it seems it never does. Good spot! And logically you can submit a block if the node is never "in sync". I guess we'll have to deal with it to fix this.
@alt-ivan Can you please try going through the process using this commit: https://github.com/ChainSafe/forest/pull/4328 ?
It's going to be merged to master automatically in a little bit. This should fix the follow issue.
@ruseinov I will be testing shortly and reporting back here. Thanks!
@ruseinov Good news! I tried it on the latest commit and mining works!
Couple of notes:
-
It seems that Lotus processes the block received via
SyncSubmitBlocksynchronously while Forest does it asynchronously, leading to a small delay betweenSyncSubmitBlockand the tipset being processed/chain head getting updated. ~~This shouldn't be a big deal in practice because I've just added a small delay to the miner to tackle this issue, but I assume it could potentially be a problem since this delay isn't deterministic.~~ In fact I can just tweak the miner to wait for tipset at expected height so you can ignore this altogether :D -
The Forest node now enters follow mode and properly syncs up with Lotus even without active miners on Lotus, hooray!
-
Now we have a situation where Lotus fails some validation when syncing to Forest in devnet :) I'll paste Lotus logs below for posterity, but I won't expand on this here since it's outside the scope of this issue
lotus | 2024-05-15T15:22:37.565Z ERROR chain chain/sync_manager.go:255 error during sync in [bafy2bzaceajmcnxkqbwwe7fzw5ildicadw57kw2eyu2sprwynhqeklxlbaazs]: collectChain failed: chain linked to block marked previously as bad ([bafy2bzaceajmcnxkqbwwe7fzw5ildicadw57kw2eyu2sprwynhqeklxlbaazs], bafy2bzacedt364lbbw4xavfnmbka2k4r27245g3zc3kepzy2ejeyafyx7rio4) (reason: linked to bafy2bzaceaewkhk2ikz7d46ilbc7j2tbcfydcrl4u3qykgkank4th4x2xwu3q caused by: [bafy2bzacecf2hjz7sarl3qfphjn5u5mnbfmfbmeefepqp7atnzbc2gnrqkcz4] 1 error occurred:
lotus | * failed to validate blocks random beacon values:
lotus | github.com/filecoin-project/lotus/chain/consensus/filcns.(*FilecoinEC).ValidateBlock.func4
lotus | /lotus/chain/consensus/filcns/filecoin.go:248
lotus | - unexpected beacon round 7660175, expected 7660174 for epoch 180:
lotus | github.com/filecoin-project/lotus/chain/beacon.ValidateBlockValues
lotus | /lotus/chain/beacon/beacon.go:99
Good news! I tried it on the latest commit and mining works!
Nice!
- It seems that Lotus processes the block received via
SyncSubmitBlocksynchronously while Forest does it asynchronously, leading to a small delay betweenSyncSubmitBlockand the tipset being processed/chain head getting updated. This shouldn't be a big deal in practice because I've just added a small delay to the miner to tackle this issue, but I assume it could potentially be a problem since this delay isn't deterministic
~Perhaps the best option is to have some sort of retry mechanism there. It should not be long, but we currently have no means of making sure this is synchronous. However we're working on a new state machine that might allow us to solve this issue cc @lemmih . I'm happy to contribute that part of the codebase if need be, but for now we have to make do with the current setup.~
Ah yeah, if you can wait for a tipset - all good then. I think limiting this to a sync operation is not necessary.
3. Now we have a situation where Lotus fails some validation when syncing to Forest in devnet :) I'll paste Lotus logs below for posterity, but I won't expand on this here since it's outside the scope of this issue
Gotta investigate. Would be great to have a semi-automated reproducible case.