forest
forest copied to clipboard
Exporting a snapshot with depth=900 might not be sufficient to bootstrap a forest or lotus node
Describe the bug
To Reproduce Steps to reproduce the behaviour:
forest --chain calibnet --encrypt-keystore false --no-gc --height=-900 --auto-download-snapshotforest-cli snapshot export -d=900forest --chain calibnet --encrypt-keystore false --import-snapshot [SNAPSHOT]orlotus --remove-existing-chain --import-snapshot [SNAPSHTO]fail with below errors
7u5nd7v6ovgi4dyitvsc757ayemvbmdg bafy2bzacecxlqvpliee2eliwtrw2f4jg2tmsxgamsaiphhm6guxprljkv57jw]: collectChain failed: collectChain syncMessages: message processing failed: validating block bafy2bzacebd5w4eezqkhbilor4xu3gvteaoquc762v2ucllhjdqmwmhtpbh4c: 1 error occurred:
* determining if miner has min power failed:
github.com/filecoin-project/lotus/chain/consensus/filcns.(*FilecoinEC).ValidateBlock.func2
/opt/filecoin/chain/consensus/filcns/filecoin.go:186
- loading power actor state:
github.com/filecoin-project/lotus/chain/stmgr.minerHasMinPower
/opt/filecoin/chain/stmgr/actors.go:410
- load state tree:
github.com/filecoin-project/lotus/chain/stmgr.(*StateManager).ParentState
/opt/filecoin/chain/stmgr/read.go:28
- failed to load state tree bafy2bzaced4wmuzsqsbeap77zfypimgtpolgtuxgjutvrwwg2itt3fqe3yxua:
github.com/filecoin-project/lotus/chain/state.LoadStateTree
/opt/filecoin/chain/state/statetree.go:295
- failed to load hamt node:
github.com/filecoin-project/specs-actors/actors/util/adt.AsMap
/go/pkg/mod/github.com/filecoin-project/[email protected]/actors/util/adt/map.go:41
- ipld: could not find bafy2bzaced4wmuzsqsbeap77zfypimgtpolgtuxgjutvrwwg2itt3fqe3yxua
Repo CI log: https://github.com/ChainSafe/forest/actions/runs/8423386587/job/23080675345
Log output
Log Output
paste log output...
Expected behaviour
Screenshots
Environment (please complete the following information):
- OS:
- Branch/commit
Other information and links
@hanabi1224, why are you exporting with 900 recent stateroots? The default is 2000.
@LesnyRumcajs The minimum allow value is CHAIN_FINALITY=900, if that's insufficient we should update the CLI with a working minimum, does that make sense?
The current minimum value matches the logic in Lotus. There may be a use case where setting it that low makes sense. At most, I'd add a warning, but I'd still leave enough rope for the user to hang himself with it. :)
I think you have some "off by 1" issue somewhere. During "ideal conditions" i.e. no reorgs, 900 state roots "should" be enough. I tried with forest-cli snapshot export -d=901 and it works where as 900 doesn't. I also did forest-cli state fetch on the state root (SNAPSHOT_HEAD - 900) Forest fails to load from, and everything works after that as well.
2000 is the default because you really want at minimum 2 finality lengths of states in case of reorgs so that you can get the correct power table to verify winning tickets and other things that require a large lookback.
@ec2 Thanks for your investigation, there is indeed some offset by 1 issue in the code.
Update: This is a problem in the daemon logic instead of the snapshot export logic. In this case, the heaviest tipset in the snapshot should be trusted and skipped.
(lotus version 1.26.1+calibnet+git.9dc9a5cf4 can now be bootstrapped with a d=900 snapshot while forest cannot, when the head of the snapshot remain unchanged. That said, if a d=900 snapshot is exported with the latest epoch, it's likely that lotus cannot be bootstrapped either when the tipset at the head epoch is changed)
Repro steps:
# Export a forest snapshot
forest-cli snapshot export --skip-checksum -t 1516720 -d 900 -o forest_1516720.car.zst
# Export a lotus snapshot
lotus chain export --skip-old-msgs --tipset @1516720 --recent-stateroots 900 lotus_1516720.car
# The snapshots are confirmed identical
zstd -d forest_1516720.car.zst
cmp forest_1516720.car lotus_1516720.car
# Bootstrap forest from scratch
forest --chain calibnet --encrypt-keystore false --save-token /tmp/forest_token --import-snapshot forest_1516720.car.zst
# Got
# WARN forest_filecoin::chain_sync::tipset_syncer: Validating block [CID = bafy2bzaced5izwa2uuule3y2qtlvfmznq6rv3bi7o5rf243pqtbcgcxnxldwm] in EPOCH = 1516720 failed: Validation error: Validation error: Consensus error: StateManager error: Can't create a valid state tree from the given root. This error may indicate unsupported version. state_root_cid=bafy2bzacec2vt33g6ydokkuj5k6ljvhrgoeo5enxntqalsui6a32wnmx6ckca, state_root_version=unknown parent_state=bafy2bzacec3zzdczp46lunfghkfemarev66a6mk346hwqo3562ofecemn73sw
# Bootstrap lotus from scratch
lotus daemon --remove-existing-chain --import-snapshot ~/fr/snapshots/calibnet/forest_1516720.car.zst
# Got
...
2024-04-11T19:28:11.230+0800 INFO chain chain/sync.go:625 block validation {"took": 4.91154541, "height": "1516721", "age": 6881.230240799}
...
When I print out {epoch} - {parent_states}
1515821 - bafy2bzacecbpxhmxofoiz6p5pjvjnjjizhbtbi2zfmy2wzbvgepury34pswau
1515820 - bafy2bzacec2vt33g6ydokkuj5k6ljvhrgoeo5enxntqalsui6a32wnmx6ckca
I can see the missing bafy2bzacec2vt33g6ydokkuj5k6ljvhrgoeo5enxntqalsui6a32wnmx6ckca is from epoch 1515820
Lotus was validating blocks since 1516721 and succeeded, while forest was validating blocks since 1516720 and failed
@lemmih Looking into the lotus code. It seems that lotus Syncer validates tipsets in (current_head+1)..=proposed_head range, while forest Syncer validates tipsets in current_head..=proposed_head range.
Lotus code: https://github.com/filecoin-project/lotus/blob/master/chain/sync.go#L732
blockSet := []*types.TipSet{incoming}
// Parent of the new (possibly better) tipset that we need to fetch next.
at := incoming.Parents()
// we want to sync all the blocks until the height above our
// best tipset so far
untilHeight := known.Height() + 1
...
loop:
for blockSet[len(blockSet)-1].Height() > untilHeight {
...
ts, err := syncer.store.LoadTipSet(ctx, at)
...
blockSet = append(blockSet, ts)
at = ts.Parents()
}
Forest code:
let mut parent_tipsets = nonempty![proposed_head.clone()];
...
'sync: loop {
let oldest_parent = parent_tipsets.last();
...
// Check if we are at the end of the range
if oldest_parent.epoch() <= current_head.epoch() {
// Current tipset epoch is less than or equal to the epoch of
// Tipset we a synchronizing toward, stop.
break;
}
...
// Attempt to load the parent tipset from local store
if let Some(tipset) = chain_store
.chain_index
.load_tipset(oldest_parent.parents())?
{
parent_blocks.extend(tipset.cids());
parent_tipsets.push(tipset);
continue;
}
}