substrate
substrate copied to clipboard
Child trie and state machine refactors
This PR expose the refactor from my current transient storage branch: https://github.com/cheme/substrate/tree/transient).
The transient branch changeset is big too and I want to split it in different PR (and LOT of testing is still needed in the target branch, and maybe taking the Rfc/PPP process).
This first PR is focusing on substrate changes with no additional functionalities. This will allow to focus this review on backward compatibility. But some changes merely make sense when related to the full change set and consulting the mentioned full branch may be useful.
No need to merge this right now, it is more a preparatory PR to show and discuss changes and direction, so low priority.
Looking at it, I also wonder if removing ChildInfo enum for just name or ChildType + name in read api would be better.
In this PR
-
Change sp-externalities:
- storage access functions becomes &mut function: all in all this is not strictly necessary.
- by allowing to write on read access allow evolution such as:
-
append optimizing: only write the appended content on write (store a Vec<Append>), and only resolve (concatenate and cache the changes) on read (for https://github.com/paritytech/polkadot-sdk/issues/30). - access analysis: cache access info. - value returns from Vec
to Cow<[u8]>: this is rather incomplete and may be good to rollback at this point. Simply in sp_io we end up doing into own (would need some change to wasm interface proc macro to pass function context in the with_externalityscope and return the u64 there, and only for the case of wasm calling host: a bit beyond this pr scope). - do not yet allow local block caching without locking or inner-mutability, would need to change state-machine backend trait similarily, but a few things in place makes it hard to do (I stopped on the runtime fetcher (need clone on non mutuable), it is only this, there may be easy way to handle this by adding runtime fetching to the state-machine backend, but again a bit beyond this PR scope.
- storage access functions becomes &mut function: all in all this is not strictly necessary.
- by allowing to write on read access allow evolution such as:
-
-
remove change trie extrinsic indexing in state-machine
In full branch
Generally I am not very happy with multiplexing some function under ChildInfo.
-
Blob and Btree transient storage are part of the child tree mechanism. This choice could allow easier implementation of non transient variant, but there may be some awkward interfacing (choosing between a generit child state function or a dedicated one).
-
Blob
-
chunked data (every changes are stored by chunks).
- A big concern here is reconstruction (concatenating all chunks), this could be cached with the mutability change (not implemented yet).
- Chunk size is a bit random too, my only concern was to make it small enough and still aligned with most chunks (eg the chunk of blake3 tree).
- A big concern here is reconstruction (concatenating all chunks), this could be cached with the mutability change (not implemented yet).
-
hashing is done on new host function that keep trace of hasher: this is very discutable
- blob changes are still in memory when using cumulus: may want to have them also in host side.
- the hasher state is stored in the host memory (in statemachine overlay).
- polkadot should use the host function (and have a change overlay attach when executing pvf).
- blob changes are still in memory when using cumulus: may want to have them also in host side.
-
-
api
- mix of encoded non encoded hashes, is a bit ~
polkadot companion: https://github.com/paritytech/polkadot/pull/6550 cumulus companion: https://github.com/paritytech/cumulus/pull/2087
Putting this out of draft, some change can be a bit controversial (can rollback them and specialize changes to the transient storage api only):
- [] storage function using range of bytes as input parameter (and range not using rust trait).
- [] storage function returning Cow<[u8]> to avoid instantiating a Vec when hitting the change overlay. Which in practice will be instantiated on the wasm boundary.
- [] externality switching most & function to &mut, but not state machine backend (in practice we got often write on read access but some of those are part of state machine backend).
- [] moving partially to use DefaultChild rather than ChildInfo, generally specializing to a given child type (considering child state can have very different api). But at the same time target branch keep using child info in state machine (sometime it feels a bit forcefull). -> might just go fully to avoid ChildInfo as much as possible already.
- [] OverlayChanges renamed to Changes: could only be local to state machine only (and exported as OverlayChanges).
Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.
+1 for removing ChildInfo enum. Will make code so much easier to follow.
DefaultChild is a bit of a confusing name. It is not a child after all, but a child tree as far a I understand.
Are there any other type of child tries other than "default" planned in the immediate future?
+1 for removing
ChildInfoenum. Will make code so much easier to follow.DefaultChildis a bit of a confusing name. It is not a child after all, but a child tree as far a I understand.
Yes it as actually more a DefaultMerkleTrie or DefaultTrie, actuallly TrieDefault (a merkle trie using the default substrate state) would make more sense or TrieTop (a merkle trie using the top trie state definition).
TrieTop might actually not be future proof, so I think I will switch to TrieDefault unless I got a better idea.
(need to first rebase on master has branch went stale and I also got changes from https://github.com/w3f/PPPs/pull/11 to report on https://github.com/cheme/substrate/tree/transient (which also uses this branch))
Are there any other type of child tries other than "default" planned in the immediate future?
There is the transient btree storage and the blob storage (for the second one it is more a child state). Both are actually not child in the sense that their root or hash are not always attached. It is more an attached transient state. In my implementation (IIRC), the choice is something that can be discussed. It is more attached content (unless runtime write the root or hash in state and archive the transient info then it can be accessed as child state from a rpc or other). Going in the child state definition direction is only really interesting in case it could switch to being persisted, but this is really not even plan (would need a different db storage). So only benefit is to get something a bit similar in code, but there is only a few functions that handle the enum due to different api, and I remember thinking it would be good to separate them in some case (to make things more readable).
Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.
will soon.
This PR expose the refactor from my current transient storage branch: https://github.com/cheme/substrate/tree/transient).
This branch will effectively address https://github.com/paritytech/polkadot-sdk/issues/245?
This PR expose the refactor from my current transient storage branch: https://github.com/cheme/substrate/tree/transient).
This branch will effectively address paritytech/polkadot-sdk#245?
Yes it target this one, but it is a version with many host function, I am testing a different approach in branch "https://github.com/cheme/substrate/tree/transient-global-mem" where I just expose a global wasm memory (change overlay with transaction support) and a few host function (for hashing and storing).
Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.
https://github.com/paritytech/substrate/pull/13940 is more prioritary for now, this can be close until then I think.