forest icon indicating copy to clipboard operation
forest copied to clipboard

Investigate migration path from nv15 to nv16.

Open lemmih opened this issue 2 years ago • 3 comments

Issue summary

It's unclear how we can support migrations without adding a lot of code complexity. This issue is meant to shed light on the matter and illuminate a sustainable path forward.

Sub-tasks:

  • Find or generate snapshots for mainnet and calibnet right before the nv16 upgrade. Upload the snapshots to our Digital Ocean Space.
  • Summarize the changes between nv15 and nv16. Actor IDs definitely changed, some gas calculations changed, drand calculation changed, what else was changed?
  • Propose a rough set of changes that would make Forest support the migration from nv15 to nv16.

Other information and links

lemmih avatar Jul 28 '22 13:07 lemmih

Findings

  1. Actor IDs definitely changed

For following actors only their cid have changed:

  • init
  • cron
  • account
  • power
  • miner
  • paymentchannel
  • multisig
  • reward
  • verifiedregistry

Those are just simple code migration.

For system and market actors there's both code and state changes. That's why there is dedicated logic for their migration.

The system actor need to update the state tree with its new state that holds now the ManifestData cid.

For the market actor more work is involved to upgrade actor state due to support for UTF-8 string label encoding in deal proposals and pending proposals (see FIP-0027).

  1. Some gas calculations changed

I don't think we are concerned by this. Gas metering can change at a given protocol upgrade for one or many actors but the impact is irrelevant as it doesn't modify blockchain data structures. Gas calculations should only impact code and in our case the nv16 ref-fvm is already supporting the new gas changes.

  1. drand calculation changed

Ditto.

  1. What else changed?

Nothing else as far I can see.

Open questions

  • pre-migration framework + caching: how much do we need a similar approach in Forest? Are there other alternatives? We can definitely skip this part at first. For information the old nv12 state migration in forest took around 13-15 secs.

  • Seen in Lotus: UpgradeRefuelHeight. What's Refuel for?

  • Migration logic is in spec-actors (go actors), what the future of this given clients moved to builtin-actors (rust actors) and ref-fvm? In an ideal world we might want a shared migration logic.

  • Implement Lite migration?

    should allow for easy upgrades if actors code needs to change but state does not. Example provided above the function to perform all the migration duties. Check actors_version_checklist.md for the rest of the steps.

  • What are non-deferred actors in the context of a migration?

  • The migrationJobResult struct is using a states7 actor instead of a states8 one (in go spec-actors). Typo or are there some good reasons?

Changes rough proposal

To support nv15 to nv16 migration we need to:

  • [ ] Make forest sync again on nv15 and be able to support multiple network versions.
  • [ ] Understand existing forest migration framework (used in the past for nv12 migration). Can we reuse most of the code as is?
  • [ ] Implementation of the nv16 migration logic (replicating same logic as in spec-actors).
  • [ ] Implementation of unit tests covering this migration.
  • [ ] Implemention of a migration schedule that will select the right migration path.
  • [ ] Test migration using the exported calibnet and mainnet snapshots and respectively measure the elapsed time and memory usage.

Test snapshots

For testing a calibnet migration two snapshots have been exported with Lotus:

  • lotus_snapshot_2022-Aug-5_height_1044460.car
  • lotus_snapshot_2022-Aug-5_height_1044659.car

They are respectively exported 200 and 1 epochs before the Skyr upgrade (the 200 version could be useful if we decide to implement a pre-migration like in Lotus).

For testing a mainnet migration one snapshot has been retrieved from PL s3 bucket.

  • minimal_finality_stateroots_1955760_2022-07-05_00-00-00.car

This one is 4560 epochs before. If needed we can extract closer snapshots later.

Those snapshots have been uploaded to our Digital Ocean Spaces.

elmattic avatar Aug 08 '22 07:08 elmattic

Just to note, what changed between versions is maintained in the tpm repo, e.g. all the changes in NV15 -> NV16

LesnyRumcajs avatar Aug 10 '22 07:08 LesnyRumcajs

Just to note, what changed between versions is maintained in the tpm repo, e.g. all the changes in NV15 -> NV16

Thanks! That will be super useful!

elmattic avatar Aug 10 '22 08:08 elmattic

Let's add the findings into a markdown document somewhere.

LesnyRumcajs avatar Sep 07 '22 15:09 LesnyRumcajs