mina icon indicating copy to clipboard operation
mina copied to clipboard

Make initial root migration for converting ledgers gradual

Open cjjdespres opened this issue 5 months ago • 2 comments

Related to https://github.com/MinaProtocol/mina/pull/17874 and https://github.com/MinaProtocol/mina/issues/17570.

I realized that the initial ledger migration in the converting merkle tree is written with one set_batch_accounts on all the accounts in the ledger:

https://github.com/MinaProtocol/mina/blob/2c70f34e3a78a194fba5bb55230d5df5fec5efb5/src/lib/merkle_ledger/converting_merkle_tree.ml#L68-L75

This is going to cause the daemon to become unresponsive for that entire operation, as it will have to rehash all the accounts and recreate the auxiliary tables in the database in a single async job. This could cause networking bugs like the one fixed in https://github.com/MinaProtocol/mina/pull/17874; at the very least it will result in a poor user experience if the daemon randomly becomes unresponsive to commands.

We can likely fix this simply by having a new ledger interface method like set_batch_accounts that returns a unit Deferred.t, and which would do its various operations in chunks, giving the scheduler the opportunity to run other async jobs (like handling networking).

cjjdespres avatar Sep 29 '25 16:09 cjjdespres

I wrote the method to do this while fixing up https://github.com/MinaProtocol/mina/pull/17816 (which already contains basically the method I want to write). It takes a Stable_db-backed root and turns it into a Converting_db root by gradually migrating the stable DB component. So, this can be fixed with this strategy:

  1. Wait for above PR to be fixed and merged
  2. Write a variant of Root.create that takes the directory_name of the stable DB and tries to open up any kind of root at that directory_name. It would either: (1) open a Converting_db root if the converting DB associated to directory_name was also present and in sync, or (2) open a Stable_db root otherwise (and would optionally clean up and warn about an out-of-sync converting DB).
  3. Write a method for persistent_root.ml that can change the backing of the snarked root from stable to converting. It would use the method merged in (1) to do this, and would changed the stored backing type in the factory as well.
  4. Do a few things at once:
    • Modify the create_root method in the genesis ledger code, so it simply copies (via checkpoint) whatever kind of root ledger was backing the genesis ledgers. It would act basically like the above Root.create variant - it would take just a directory_name, and would checkpoint whatever databases were backing the root to the location specified by directory_name. (The stable DB would be named directory_name and the converting DB would be directory_name ^ "_converting").
    • Use the new create_root method when we want to create a new root from genesis. Also use the new Root.create variant whenever the daemon tries to open up a pre-existing root database (not created in by the currently-running daemon).
    • If the daemon is set to maintain converting roots, call the methods in (1) and (3) right after we open up a snarked root or epoch ledger snapshot for the first time with the root creation variants.

Later, to solve https://github.com/MinaProtocol/mina/issues/17747, we can change where the root conversion methods are called. (This would be right after the initial snarked root and epoch ledger snapshot syncing).

cjjdespres avatar Oct 03 '25 16:10 cjjdespres

After checking the code, this actually would be blocked by https://github.com/MinaProtocol/mina/issues/17747

glyh avatar Dec 10 '25 13:12 glyh