forest icon indicating copy to clipboard operation
forest copied to clipboard

Investigate whether we can match Lotus' snapshots byte-for-byte

Open lemmih opened this issue 1 year ago • 1 comments

Issue summary

Both Lotus and Forest has the ability to generate snapshots. However, it has come to light that Forest snapshots fail after a day or two due to unexplained forks in the blockchain. Therefore our snapshots must be different from the snapshots from Lotus and we need to figure out why.

Tasks:

  • [ ] Start with fairly recent calibnet snapshot from Lotus. (Bootstrap with a Lotus snapshot from our DO Space and generate a new snapshot.)
  • [ ] Initiate both Lotus and Forest with the calibnet snapshot.
  • [ ] Export a new snapshot with the same settings (epoch, recent stateroots, etc) from both Forest and Lotus.
  • [ ] Check if they are exactly the same.
  • [ ] If they are not, skim through the Lotus and Forest code to find differences. Add those differences to a new issue.

Other information and links

Lotus snapshots for calibnet: https://cloud.digitalocean.com/spaces/forest-snapshots?i=88c522&path=lotus-calibnet%2F

lemmih avatar Sep 07 '22 09:09 lemmih

Apparently the unexplained forks also happen with snapshots from Lotus. However, this issue is still important as we would like to prove that our snapshots are valid (and equivalent to those from Lotus).

lemmih avatar Sep 12 '22 15:09 lemmih

The fork issue turned out to be unrelated to how we're generating snapshots. I will close this issue for now since it's not a big priority anymore. Byte-for-byte identical snapshots would be nice but it's definitely not necessary. May re-open this in the future if things change.

lemmih avatar Oct 11 '22 12:10 lemmih

Re-opening with low priority.

lemmih avatar Dec 05 '22 10:12 lemmih

Steps to get result:

  • To get latest snapshot for Forest, run forest-cli --chain calibnet snapshot fetch --snapshot-dir .. If successfully downloaded, snapshot will save with format forest_snapshot_[network]_[date]_height_[epoch].car.
  • To get latest snapshot for Lotus, run forest-cli --chain calibnet snapshot fetch --snapshot-dir . --provider filecoin. If successfully downloaded, snapshot will save with format filecoin_snapshot_[network]_[date]_height_[epoch].car.
  • Import forest... snapshot to Forest with forest --chain calibnet --import-snapshot [file] --encrypt-keystore false.
  • Allow Forest node to run until result of forest-cli sync wait (in separate terminal windw) is Done!.
  • Export Forest snapshot with forest-cli snpashot export. When finished, shut down Forest node before attempting to start the Lotus node.
  • Import filecoin... snapshot to Lotus with lotus daemon --import-snapshot [file] (remember to switch to the proper network with make clean calibnet first, if necesssary).
  • Allow Lotus to run until result of lotus sync wait (in separate terminal windw) is Done!.
  • Export Lotus snapshot with lotus chain export --recent-stateroots 2000 --skip-old-msgs [file].
  • Compare bytes with cmp [Lotus snapshot] [Forest snapshot].

jdjaustin avatar Dec 14 '22 19:12 jdjaustin

Problem is the files--although they are relatively similar in size--differ at byte 1, line 1, and using the cmp -l option produces a list showing nearly every byte differing. Perhaps each file has a different header, producing an offset that propagates through the files?

jdjaustin avatar Dec 14 '22 19:12 jdjaustin

They definitely won't match if the snapshots aren't for the same epoch.

lemmih avatar Dec 15 '22 16:12 lemmih

They definitely won't match if the snapshots aren't for the same epoch.

Is there a way to ensure that the snapshots are exported at the same epoch?

jdjaustin avatar Dec 15 '22 16:12 jdjaustin

I was able to get snapshots from the same epoch. It appears that they start to differ at byte 970488820 and then differ for the rest of the file after that point (lines of output above cmp [Forest snapshot] [Lotus snapshot] are from using -l option flag to show all diffs). Screenshot from 2023-01-12 11-56-27

jdjaustin avatar Jan 12 '23 18:01 jdjaustin

We should not expect the snapshots to match.

Forest logic differs from Lotus in the walk_snapshot method. In particular, Lotus seems to cover more cases (e.g., https://github.com/filecoin-project/lotus/pull/8691).

So the first step towards the direction of snapshot identity would be to match the logic in this method.

LesnyRumcajs avatar Jan 13 '23 08:01 LesnyRumcajs

I was able to get snapshots from the same epoch. It appears that they start to differ at byte 970488820 and then differ for the rest of the file after that point (lines of output above cmp [Forest snapshot] [Lotus snapshot] are from using -l option flag to show all diffs).

Great! So they're like 99.9% identical? Have a chat with @LesnyRumcajs about the differences between our walk function and theirs. There might be a simple way to go from 99.9% identical to 100% identical.

lemmih avatar Jan 13 '23 08:01 lemmih