polkadot icon indicating copy to clipboard operation
polkadot copied to clipboard

DMP storage refactoring

Open sandreim opened this issue 3 years ago • 9 comments

This refactors DMP queue and proof storage to give parachains more flexibility around consumption of the messages.

List of things need to be completed:

  • [x] Improve public fn and module documentation
  • [x] Improve code comments
  • [x] Identify state invariants
  • [x] Introduce assert_storage_consistency_exhaustive to check them in tests
  • [x] Introduce type wrappers - WrappingIndex, MessageIdx, QueuePageIdx.
  • [x] Abstractions for ring buffer and message window
  • [x] unit tests for ring buffer and message window
  • [x] migration tests
  • [ ] Decide if dmq_contents_bounded is added as staging or v2/v3 API
  • [ ] Check kusama/polkadot queue size, to ensure the migration doesn't panic.

Cumulus companion: https://github.com/paritytech/cumulus/pull/1443

sandreim avatar Jul 12 '22 13:07 sandreim

The companion kind of needs to go in first, before this PR ever lands and could be released in a Polkadot/Kusama runtime. We'd also need all parachains to upgrade their collators, wouldn't we?

rphmeier avatar Jul 16 '22 01:07 rphmeier

The companion kind of needs to go in first, before this PR ever lands and could be released in a Polkadot/Kusama runtime. We'd also need all parachains to upgrade their collators, wouldn't we?

I haven't tested this yet, but my idea was that relay chain runtime upgrade goes in first and collator changes later. The change that matters from collator perspective, is the dmq_contents API and storage proof keys. The current changeset adds a new API and the new storage proof while the old way of doing things is preserved temporarily until collators upgrade. Do you have any concerns with this approach ?

sandreim avatar Jul 18 '22 09:07 sandreim

I did the first round of reviews. So far I don't see any problems with logic.

However, there are a few groups of issues I would like to see tackled before merging this.

Documentation. I left quite a bunch of requests to change the comments here and there. Those should improve the readability of code and improve the experience for people who are just jumping into this part of the codebase. One of the important things I noticed missing, is the description of public functions and their contracts, and the storage entries and their invariants.

Documentation is now up and should cover the API, storage and generally how it works.

Besides that, I see this DMP module is graduating from being a trivial module to having quite some logic. Hence, I think it would need a proper comment describing the general logic of the module. That is, mention that it is built on top of a ring buffer, why we are going through those hoops and other properties about the code.

Then, the DMP module itself juggles quite a bit of primitives (u64) and has quite a dense wall of low-level code. I can't help but think, that it could be better to bump the abstract level up a notch. Note that head & tail are used in quite specific patterns. Unsurprisingly, it looks like a ring buffer. So maybe we could introduce a struct RingBuf that has head & tail and that provides a nice clean set of methods to control it.

Examples include:

  • creates an iterator from head to tail, returning an index to a page.
  • maybe, a function that returns the number of items on the given page,
  • allocates one page and advances the tail.
  • flushes the data to storage. This one is tricky because fn drop is annoying because you cannot absolutely panic in there since that would bring down the whole process should double-panic happen. So probably an inherent method would be needed at least until we get rid of the native runtime.

Yeah, I agree, that was really too low level. It should be much easier to understand and work with via the RingBuffer and MessageWindow abstractions.

Besides that, the code could use some type wrappers, e.g. Page{Index}(u64).

Then, we should try to make the code more robust by identifying and communicating the invariants of the state. That alone would've improved the review and onboarding experience. But we can go further, and also add a function that checks if the invariants are broken and use that in tests. Take a look at the Hrmp module's storage and the assert_storage_consistency_exhaustive function specifically to see what I mean. I found a countless number of issues with it.

This is a very good idea. Done!

sandreim avatar Aug 02 '22 16:08 sandreim

/cmd queue -v RUST_LOG=remote-ext=debug,runtime=trace -c try-runtime $ --chain=polkadot-dev --execution=Wasm on-runtime-upgrade live --uri wss://polkadot-try-runtime-node.parity-chains.parity.io:443

sandreim avatar Aug 04 '22 14:08 sandreim

@sandreim https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1721337 was started for your command "$PIPELINE_SCRIPTS_DIR/try-runtime-bot.sh" --chain=polkadot-dev --execution=Wasm on-runtime-upgrade live --uri wss://polkadot-try-runtime-node.parity-chains.parity.io:443. Check out https://gitlab.parity.io/parity/mirrors/polkadot/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment /cmd cancel 8-a4e9ca46-5669-4466-bac7-9e74d8d4c6f6 to cancel this command or /cmd cancel to cancel all commands in this pull request.

command-bot[bot] avatar Aug 04 '22 14:08 command-bot[bot]

@sandreim Command "$PIPELINE_SCRIPTS_DIR/try-runtime-bot.sh" --chain=polkadot-dev --execution=Wasm on-runtime-upgrade live --uri wss://polkadot-try-runtime-node.parity-chains.parity.io:443 has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1721337 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1721337/artifacts/download.

command-bot[bot] avatar Aug 04 '22 15:08 command-bot[bot]

/cmd queue -v RUST_LOG=remote-ext=debug,runtime=trace -c try-runtime $ --chain=kusama-dev --execution=Wasm on-runtime-upgrade live --uri wss://kusama-try-runtime-node.parity-chains.parity.io:443

sandreim avatar Aug 04 '22 16:08 sandreim

@sandreim https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1721710 was started for your command "$PIPELINE_SCRIPTS_DIR/try-runtime-bot.sh" --chain=kusama-dev --execution=Wasm on-runtime-upgrade live --uri wss://kusama-try-runtime-node.parity-chains.parity.io:443. Check out https://gitlab.parity.io/parity/mirrors/polkadot/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment /cmd cancel 9-b4bd334d-d648-43ee-a1d7-3417c64a34db to cancel this command or /cmd cancel to cancel all commands in this pull request.

command-bot[bot] avatar Aug 04 '22 16:08 command-bot[bot]

@sandreim Command "$PIPELINE_SCRIPTS_DIR/try-runtime-bot.sh" --chain=kusama-dev --execution=Wasm on-runtime-upgrade live --uri wss://kusama-try-runtime-node.parity-chains.parity.io:443 has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1721710 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/1721710/artifacts/download.

command-bot[bot] avatar Aug 04 '22 16:08 command-bot[bot]