forest
forest copied to clipboard
Investigate whether we can match Lotus' snapshots byte-for-byte
Issue summary
Both Lotus and Forest has the ability to generate snapshots. However, it has come to light that Forest snapshots fail after a day or two due to unexplained forks in the blockchain. Therefore our snapshots must be different from the snapshots from Lotus and we need to figure out why.
Tasks:
- [ ] Start with fairly recent calibnet snapshot from Lotus. (Bootstrap with a Lotus snapshot from our DO Space and generate a new snapshot.)
- [ ] Initiate both Lotus and Forest with the calibnet snapshot.
- [ ] Export a new snapshot with the same settings (epoch, recent stateroots, etc) from both Forest and Lotus.
- [ ] Check if they are exactly the same.
- [ ] If they are not, skim through the Lotus and Forest code to find differences. Add those differences to a new issue.
Other information and links
Lotus snapshots for calibnet: https://cloud.digitalocean.com/spaces/forest-snapshots?i=88c522&path=lotus-calibnet%2F
Apparently the unexplained forks also happen with snapshots from Lotus. However, this issue is still important as we would like to prove that our snapshots are valid (and equivalent to those from Lotus).
The fork issue turned out to be unrelated to how we're generating snapshots. I will close this issue for now since it's not a big priority anymore. Byte-for-byte identical snapshots would be nice but it's definitely not necessary. May re-open this in the future if things change.
Re-opening with low priority.
Steps to get result:
- To get latest snapshot for Forest, run
forest-cli --chain calibnet snapshot fetch --snapshot-dir .. If successfully downloaded, snapshot will save with formatforest_snapshot_[network]_[date]_height_[epoch].car. - To get latest snapshot for Lotus, run
forest-cli --chain calibnet snapshot fetch --snapshot-dir . --provider filecoin. If successfully downloaded, snapshot will save with formatfilecoin_snapshot_[network]_[date]_height_[epoch].car. - Import
forest...snapshot to Forest withforest --chain calibnet --import-snapshot [file] --encrypt-keystore false. - Allow Forest node to run until result of
forest-cli sync wait(in separate terminal windw) isDone!. - Export Forest snapshot with
forest-cli snpashot export. When finished, shut down Forest node before attempting to start the Lotus node. - Import
filecoin...snapshot to Lotus withlotus daemon --import-snapshot [file](remember to switch to the proper network withmake clean calibnetfirst, if necesssary). - Allow Lotus to run until result of
lotus sync wait(in separate terminal windw) isDone!. - Export Lotus snapshot with
lotus chain export --recent-stateroots 2000 --skip-old-msgs [file]. - Compare bytes with
cmp [Lotus snapshot] [Forest snapshot].
Problem is the files--although they are relatively similar in size--differ at byte 1, line 1, and using the cmp -l option produces a list showing nearly every byte differing. Perhaps each file has a different header, producing an offset that propagates through the files?
They definitely won't match if the snapshots aren't for the same epoch.
They definitely won't match if the snapshots aren't for the same epoch.
Is there a way to ensure that the snapshots are exported at the same epoch?
I was able to get snapshots from the same epoch. It appears that they start to differ at byte 970488820 and then differ for the rest of the file after that point (lines of output above cmp [Forest snapshot] [Lotus snapshot] are from using -l option flag to show all diffs).

We should not expect the snapshots to match.
Forest logic differs from Lotus in the walk_snapshot method. In particular, Lotus seems to cover more cases (e.g., https://github.com/filecoin-project/lotus/pull/8691).
So the first step towards the direction of snapshot identity would be to match the logic in this method.
I was able to get snapshots from the same epoch. It appears that they start to differ at byte
970488820and then differ for the rest of the file after that point (lines of output abovecmp [Forest snapshot] [Lotus snapshot]are from using-loption flag to show all diffs).
Great! So they're like 99.9% identical? Have a chat with @LesnyRumcajs about the differences between our walk function and theirs. There might be a simple way to go from 99.9% identical to 100% identical.