iavl icon indicating copy to clipboard operation
iavl copied to clipboard

feat: expose nodeDB's DeleteVersionsFrom method

Open pirtleshell opened this issue 1 year ago • 8 comments

DeleteVersionsFrom is the counterpart to DeleteVersionsTo. It writes deletions of a MutableTree to disk for all versions >= the provided fromVersion.

It exposes the method of the same name from the underlying nodeDB.

This method is useful for things like:

  • rolling back an upgrade that adds a module (allows for the complete removal of a KVStore from disk)
  • custom implementations of sharded data drives (allows one to pare down a data drive to a specific set of versions)

Please let me know if this should first be proposed as a Feature Request via the repo issues.

Summary by CodeRabbit

  • New Features

    • Introduced DeleteVersionsFrom API to enable the deletion of versions starting from a specified version.
  • Bug Fixes

    • Enhanced the MutableTree with a new method to support deletion of versions from a specified version upwards.

pirtleshell avatar May 24 '24 18:05 pirtleshell

Walkthrough

This update introduces the DeleteVersionsFrom API for managing versions in the MutableTree data structure. It enhances version control by allowing deletion from a specified version onward. Key changes include the addition of new methods in mutable_tree.go, related tests in mutable_tree_test.go, and documentation updates in CHANGELOG.md. This feature complements the existing functionalities like async pruning of legacy nodes and the new SaveChangeSet API.

Changes

Files Change Summaries
CHANGELOG.md Summarized the addition of DeleteVersionsFrom(int64) API and its integration in PR #952.
mutable_tree.go Added DeleteVersionsFrom method to MutableTree struct; modified existing method.
mutable_tree_test.go Renamed TestDelete to TestDeleteVersionsTo and added TestDeleteVersionsFrom for new functionality.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant MutableTree
    Client->>MutableTree: DeleteVersionsFrom(version int64)
    MutableTree-->>Client: Success/Error Response

Poem

In a forest of code, trees grow so tall,
A rabbit hops lightly to answer the call.
With DeleteVersionsFrom, old leaves fall away,
New branches can flourish, clear skies on display.
Cheers to the code, where changes are spun,
Version control under the warm, digital sun. 🌳🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai[bot] avatar May 24 '24 18:05 coderabbitai[bot]

@pirtleshell thanks for the contribution, it is used inLoadVersionForOverwriting to rollback the state as you mentioned, could you please provide more context of why the specific API is needed?

cool-develope avatar May 28 '24 18:05 cool-develope

@cool-develope sure thing.

rollback case

First, suppose you are adding a new sdk module during an upgrade. Suppose there is a bug in that module addition such that the root hash of the new module is non-deterministic & results in an app hash mismatch when the upgrade is run in a network. The goal is to roll back the bad version & re-run the upgrade with a patched binary.

The SDK's rollback cmd calls LoadVersionForOverwriting, which works by loading the previous version of the data. If an upgrade handler adds a module via StoreUpgrades, the rollback of the upgrade block will panic with an error that looks like:

Error: failed to rollback to version: version does not exist

This occurs when LoadVersionForOverwriting is called on the newly added module's store with the previous height, which does not exist because the module was just added.

AFAIK, to get around this and successfully roll back, you have two options:

  • use the previous binary to rollback, it doesn't know about the new module so won't attempt to load it
  • manually de-register the module from the root multistore so LoadVersionForOverwriting is never called for it

However, either of those options still leaves the upgrade version of the added module's store on disk.

When the upgrade is re-attemtped, the new data will not be committed because of the version that already exists on disk. The relevant sdk code is here: https://github.com/cosmos/cosmos-sdk/blob/c4d9a495052b78a02723ed1df1d336efe83a4e8d/store/rootmulti/store.go#L1188-L1197

The code is necessary to recover from interruptions, but means that new data is never committed & written to disk for the patched binary's upgrade block. If it were to manage calling commit, the IAVL Commit() method would likely error here due to mismatched data (the patched binary's version will differ from the bad one that was committed in the failed upgrade).

To truly rollback & re-run the upgrade, the data must be removed from disk.

sharding case

Less important with the awesome space-saving y'all have been doing with iavl v1 & v2, but DeleteVersionsFrom could also be used to create dbs for shards; ie. one server has blocks 1 to 2.5M, another has blocks 2.5M to 5M. You could create these dbs from a full archive node using DeleteVersionsTo and DeleteVersionsFrom


I recognize that much of what I describe about rollback could/should be accounted for higher up in the sdk (I've opened an issue: https://github.com/cosmos/cosmos-sdk/issues/20472). Exposing DeleteVersionsFrom is a just a simple non-breaking change that gives developers more control over the disk's data. It helped me resolve an app hash mismatch by rolling back an upgrade that added modules, so I figured it might help others as well, and is worth adding to this package.

Happy to make any changes, explain anything further, or open an issue/discussion.

pirtleshell avatar May 28 '24 19:05 pirtleshell

@cool-develope sure thing.

rollback case

First, suppose you are adding a new sdk module during an upgrade. Suppose there is a bug in that module addition such that the root hash of the new module is non-deterministic & results in an app hash mismatch when the upgrade is run in a network. The goal is to roll back the bad version & re-run the upgrade with a patched binary.

The SDK's rollback cmd calls LoadVersionForOverwriting, which works by loading the previous version of the data. If an upgrade handler adds a module via StoreUpgrades, the rollback of the upgrade block will panic with an error that looks like:

Error: failed to rollback to version: version does not exist

This occurs when LoadVersionForOverwriting is called on the newly added module's store with the previous height, which does not exist because the module was just added.

AFAIK, to get around this and successfully roll back, you have two options:

  • use the previous binary to rollback, it doesn't know about the new module so won't attempt to load it
  • manually de-register the module from the root multistore so LoadVersionForOverwriting is never called for it

However, either of those options still leaves the upgrade version of the added module's store on disk.

When the upgrade is re-attemtped, the new data will not be committed because of the version that already exists on disk. The relevant sdk code is here: https://github.com/cosmos/cosmos-sdk/blob/c4d9a495052b78a02723ed1df1d336efe83a4e8d/store/rootmulti/store.go#L1188-L1197

The code is necessary to recover from interruptions, but means that new data is never committed & written to disk for the patched binary's upgrade block. If it were to manage calling commit, the IAVL Commit() method would likely error here due to mismatched data (the patched binary's version will differ from the bad one that was committed in the failed upgrade).

To truly rollback & re-run the upgrade, the data must be removed from disk.

sharding case

Less important with the awesome space-saving y'all have been doing with iavl v1 & v2, but DeleteVersionsFrom could also be used to create dbs for shards; ie. one server has blocks 1 to 2.5M, another has blocks 2.5M to 5M. You could create these dbs from a full archive node using DeleteVersionsTo and DeleteVersionsFrom

I recognize that much of what I describe about rollback could/should be accounted for higher up in the sdk (I've opened an issue: cosmos/cosmos-sdk#20472). Exposing DeleteVersionsFrom is a just a simple non-breaking change that gives developers more control over the disk's data. It helped me resolve an app hash mismatch by rolling back an upgrade that added modules, so I figured it might help others as well, and is worth adding to this package.

Happy to make any changes, explain anything further, or open an issue/discussion.

makes sense, thanks for the detail explanation!

cool-develope avatar May 29 '24 17:05 cool-develope

@pirtleshell do you think we also need to update cosmos-sdk for the upgrade rollback case?

cool-develope avatar May 29 '24 17:05 cool-develope

@pirtleshell do you think we also need to update cosmos-sdk for the upgrade rollback case?

yes, i wanted to get this into iavl first but i do have a rough implementation of an update to the rollback command that fully removes the KVStore on disk for listed modules. i linked it in the issue i opened on https://github.com/cosmos/cosmos-sdk/issues/20472

probably needs cleanup but it is here: https://github.com/Kava-Labs/cosmos-sdk/pull/546 currently that code uses a forked iavl v1.2 that has the patch from this PR applied.

thinking through it right now, the cosmos-sdk versions that use iavl v0.20 can & probably should use DeleteVersionRange to accomplish the same.

pirtleshell avatar May 29 '24 18:05 pirtleshell

@cool-develope anything else needed for this PR? i think i may have confused the CI by rebasing on master to fix CHANGELOG conflict. 🙂

pirtleshell avatar Jun 04 '24 17:06 pirtleshell

bump @cool-develope @tac0turtle could someone trigger the Test CI so this can be auto-merged? thanks!

pirtleshell avatar Jun 21 '24 17:06 pirtleshell

bump @cool-develope @tac0turtle

is anything further needed for this to be included?

pirtleshell avatar Jul 15 '24 20:07 pirtleshell

@pirtleshell could merge main into your branch then we can merge it

tac0turtle avatar Jul 16 '24 09:07 tac0turtle

bumping again, hoping to get this merged @tac0turtle @cool-develope 🙏

pirtleshell avatar Jul 25 '24 21:07 pirtleshell