forest icon indicating copy to clipboard operation
forest copied to clipboard

Exporting a snapshot with depth=900 might not be sufficient to bootstrap a forest or lotus node

Open hanabi1224 opened this issue 1 year ago • 7 comments
trafficstars

Describe the bug

To Reproduce Steps to reproduce the behaviour:

  1. forest --chain calibnet --encrypt-keystore false --no-gc --height=-900 --auto-download-snapshot
  2. forest-cli snapshot export -d=900
  3. forest --chain calibnet --encrypt-keystore false --import-snapshot [SNAPSHOT] or lotus --remove-existing-chain --import-snapshot [SNAPSHTO] fail with below errors
7u5nd7v6ovgi4dyitvsc757ayemvbmdg bafy2bzacecxlqvpliee2eliwtrw2f4jg2tmsxgamsaiphhm6guxprljkv57jw]: collectChain failed: collectChain syncMessages: message processing failed: validating block bafy2bzacebd5w4eezqkhbilor4xu3gvteaoquc762v2ucllhjdqmwmhtpbh4c: 1 error occurred:
        * determining if miner has min power failed:
    github.com/filecoin-project/lotus/chain/consensus/filcns.(*FilecoinEC).ValidateBlock.func2
        /opt/filecoin/chain/consensus/filcns/filecoin.go:186
  - loading power actor state:
    github.com/filecoin-project/lotus/chain/stmgr.minerHasMinPower
        /opt/filecoin/chain/stmgr/actors.go:410
  - load state tree:
    github.com/filecoin-project/lotus/chain/stmgr.(*StateManager).ParentState
        /opt/filecoin/chain/stmgr/read.go:28
  - failed to load state tree bafy2bzaced4wmuzsqsbeap77zfypimgtpolgtuxgjutvrwwg2itt3fqe3yxua:
    github.com/filecoin-project/lotus/chain/state.LoadStateTree
        /opt/filecoin/chain/state/statetree.go:295
  - failed to load hamt node:
    github.com/filecoin-project/specs-actors/actors/util/adt.AsMap
        /go/pkg/mod/github.com/filecoin-project/[email protected]/actors/util/adt/map.go:41
  - ipld: could not find bafy2bzaced4wmuzsqsbeap77zfypimgtpolgtuxgjutvrwwg2itt3fqe3yxua

Repo CI log: https://github.com/ChainSafe/forest/actions/runs/8423386587/job/23080675345

Log output

Log Output
paste log output...

Expected behaviour

Screenshots

Environment (please complete the following information):

  • OS:
  • Branch/commit

Other information and links

hanabi1224 avatar Mar 26 '24 08:03 hanabi1224

@hanabi1224, why are you exporting with 900 recent stateroots? The default is 2000.

LesnyRumcajs avatar Mar 26 '24 08:03 LesnyRumcajs

@LesnyRumcajs The minimum allow value is CHAIN_FINALITY=900, if that's insufficient we should update the CLI with a working minimum, does that make sense?

hanabi1224 avatar Mar 26 '24 09:03 hanabi1224

The current minimum value matches the logic in Lotus. There may be a use case where setting it that low makes sense. At most, I'd add a warning, but I'd still leave enough rope for the user to hang himself with it. :)

LesnyRumcajs avatar Mar 26 '24 09:03 LesnyRumcajs

I think you have some "off by 1" issue somewhere. During "ideal conditions" i.e. no reorgs, 900 state roots "should" be enough. I tried with forest-cli snapshot export -d=901 and it works where as 900 doesn't. I also did forest-cli state fetch on the state root (SNAPSHOT_HEAD - 900) Forest fails to load from, and everything works after that as well.

2000 is the default because you really want at minimum 2 finality lengths of states in case of reorgs so that you can get the correct power table to verify winning tickets and other things that require a large lookback.

ec2 avatar Apr 01 '24 19:04 ec2

@ec2 Thanks for your investigation, there is indeed some offset by 1 issue in the code.

hanabi1224 avatar Apr 02 '24 08:04 hanabi1224

Update: This is a problem in the daemon logic instead of the snapshot export logic. In this case, the heaviest tipset in the snapshot should be trusted and skipped.

(lotus version 1.26.1+calibnet+git.9dc9a5cf4 can now be bootstrapped with a d=900 snapshot while forest cannot, when the head of the snapshot remain unchanged. That said, if a d=900 snapshot is exported with the latest epoch, it's likely that lotus cannot be bootstrapped either when the tipset at the head epoch is changed)

Repro steps:

# Export a forest snapshot
forest-cli snapshot export --skip-checksum -t 1516720 -d 900 -o forest_1516720.car.zst
# Export a lotus snapshot
lotus chain export --skip-old-msgs --tipset @1516720 --recent-stateroots 900 lotus_1516720.car

# The snapshots are confirmed identical
zstd -d forest_1516720.car.zst
cmp forest_1516720.car lotus_1516720.car

# Bootstrap forest from scratch
forest --chain calibnet --encrypt-keystore false --save-token /tmp/forest_token --import-snapshot forest_1516720.car.zst

# Got
# WARN forest_filecoin::chain_sync::tipset_syncer: Validating block [CID = bafy2bzaced5izwa2uuule3y2qtlvfmznq6rv3bi7o5rf243pqtbcgcxnxldwm] in EPOCH = 1516720 failed: Validation error: Validation error: Consensus error: StateManager error: Can't create a valid state tree from the given root. This error may indicate unsupported version. state_root_cid=bafy2bzacec2vt33g6ydokkuj5k6ljvhrgoeo5enxntqalsui6a32wnmx6ckca, state_root_version=unknown parent_state=bafy2bzacec3zzdczp46lunfghkfemarev66a6mk346hwqo3562ofecemn73sw

# Bootstrap lotus from scratch
lotus daemon --remove-existing-chain --import-snapshot ~/fr/snapshots/calibnet/forest_1516720.car.zst
# Got
...
2024-04-11T19:28:11.230+0800    INFO    chain   chain/sync.go:625       block validation        {"took": 4.91154541, "height": "1516721", "age": 6881.230240799}
...

When I print out {epoch} - {parent_states}

1515821 - bafy2bzacecbpxhmxofoiz6p5pjvjnjjizhbtbi2zfmy2wzbvgepury34pswau
1515820 - bafy2bzacec2vt33g6ydokkuj5k6ljvhrgoeo5enxntqalsui6a32wnmx6ckca

I can see the missing bafy2bzacec2vt33g6ydokkuj5k6ljvhrgoeo5enxntqalsui6a32wnmx6ckca is from epoch 1515820

Lotus was validating blocks since 1516721 and succeeded, while forest was validating blocks since 1516720 and failed

hanabi1224 avatar Apr 11 '24 11:04 hanabi1224

@lemmih Looking into the lotus code. It seems that lotus Syncer validates tipsets in (current_head+1)..=proposed_head range, while forest Syncer validates tipsets in current_head..=proposed_head range.

Lotus code: https://github.com/filecoin-project/lotus/blob/master/chain/sync.go#L732

blockSet := []*types.TipSet{incoming}

// Parent of the new (possibly better) tipset that we need to fetch next.
at := incoming.Parents()

// we want to sync all the blocks until the height above our
// best tipset so far
untilHeight := known.Height() + 1
...
loop:
	for blockSet[len(blockSet)-1].Height() > untilHeight {
		...
		ts, err := syncer.store.LoadTipSet(ctx, at)
		...
		blockSet = append(blockSet, ts)
		at = ts.Parents()
	}

Forest code:

let mut parent_tipsets = nonempty![proposed_head.clone()];
...
'sync: loop {
	let oldest_parent = parent_tipsets.last();
	...
	// Check if we are at the end of the range
	if oldest_parent.epoch() <= current_head.epoch() {
		// Current tipset epoch is less than or equal to the epoch of
		// Tipset we a synchronizing toward, stop.
		break;
	}
	...
	// Attempt to load the parent tipset from local store
	if let Some(tipset) = chain_store
		.chain_index
		.load_tipset(oldest_parent.parents())?
	{
		parent_blocks.extend(tipset.cids());
		parent_tipsets.push(tipset);
		continue;
	}
}

hanabi1224 avatar Apr 18 '24 13:04 hanabi1224