ref-fvm icon indicating copy to clipboard operation
ref-fvm copied to clipboard

Optimize EVM runtime storage footprint

Open raulk opened this issue 2 years ago • 15 comments

Currently, the built-in EVM actor is using a HAMT as the physical data structure to represent Ethereum smart contract storage. We know this is an inefficient data structure to map, store, and query highly atomized/granular U256 => U256. This epich is about:

  1. Modeling different Ethereum native workloads to benchmark the HAMT in terms of gas performance (obtain a baseline).
  2. Investigating and proposing alternative data structures that would lead to more efficient patterns of access and writing.
  3. Discussing and agreeing on one or two approaches to prototype.
  4. Prototyping and benchmarking them.
  5. Taking the implementation of the selected approach to completion.

raulk avatar Sep 09 '22 14:09 raulk

For point 1, I think the first thing we'll want to do is write several Solidity contracts that are write and read intensive in terms of data volume and dispersion of keys. We can use that suite as a benchmark harness, and to establish an initial baseline.

We can also add real typical Ethereum contracts into the benchmarking suite — that way we can make sure we are not overfitting to read/write intensive scenarios.

raulk avatar Sep 09 '22 14:09 raulk

@Stebalien — could you review this issue for correctness of approach, and provide thoughts and guidance to guide @aakoshh's work?

raulk avatar Sep 09 '22 14:09 raulk

This looks correct. To extend it a bit:

  • We should fix https://github.com/filecoin-project/ref-fvm/issues/636 first.
  • We should test the HAMT with different parameters. HAMTs may actually work, but we'll probably need to tune the width and bucket size at a minimum.

Beyond that, we should try:

  • Optimizing for sequential storage (e.g., with some form of sparse array).
  • Optimizing the storage of small(er) numbers. We don't have to treat U256 values as byte arrays. We can treat them as integers when small.
  • Optimize the storage of keys. Currently, our HAMTs aren't "proper" tries and store the full keys in the leaves. Given our small values, we may want to optimize that.

Stebalien avatar Sep 09 '22 15:09 Stebalien

As much as it pains me, we may also want to consider a custom "packed" format for the leaves of our merkletree given our highly regular data.

Stebalien avatar Sep 09 '22 15:09 Stebalien

Optimizing for sequential storage (e.g., with some form of sparse array).

Apparently, this won't help.

Stebalien avatar Sep 09 '22 16:09 Stebalien

^^ Solidity calculates storage slots by hashing, so unfortunately optimizing for sequential keys will probably not help much: https://docs.soliditylang.org/en/v0.8.13/internals/layout_in_storage.html

raulk avatar Sep 09 '22 17:09 raulk

I'm actually not so sure:

Except for dynamically-sized arrays and mappings ... data is stored contiguously item after item starting with the first state variable, which is stored in slot 0

In terms of dynamically sized arrays/maps:

  1. The length is stored in slot p.
  2. The data is stored in slots starting at hash(p).

So it looks like optimizing sequential storage may help quite a bit. Of course, that's if I'm reading this right...

Stebalien avatar Sep 09 '22 18:09 Stebalien

If that's the case:

  • We should optimize for "low" slots. We'll likely want to put "low" slots in the state's root.
  • We should optimize sequential slots at random (high) locations.
  • We should avoid double-hashing. Currently, hamts hash the keys to randomize locations, but we don't need that here. Keys should either be sequential or already randomized.

Stebalien avatar Sep 09 '22 18:09 Stebalien

We should avoid double-hashing. Currently, hamts hash the keys to randomize locations, but we don't need that here. Keys should either be sequential or already randomized.

Related to this, I'm looking at the part of the EVM storage layout docs where they illustrate the slots of a struct:

Let us compute the storage location of data[4][9].c. The position of the mapping itself is 1 (the variable x with 32 bytes precedes it). This means data[4] is stored at keccak256(uint256(4) . uint256(1)).

So it looks like the data is stored under the hash of the slot, which is determined by where the variable is declared in the Solidity file. Is there anything else the compiler adds to these hashes at all that helps distinguish from other contracts' data at the same slot? Or should we perhaps always prefix the key by the contract ID that the data belongs to, to have unique hashes across different contracts?

aakoshh avatar Sep 14 '22 14:09 aakoshh

I wouldn't try to get into the semantics of how Solidity stores data inside its complex types because:

  • It changes across Solidity compiler versions
  • It make FEVM implementation incompatible with other EVM compatible languages like Vyper.

karim-agha avatar Sep 14 '22 14:09 karim-agha

Is there anything else the compiler adds to these hashes at all that helps distinguish from other contracts' data at the same slot?

I realised this is a non-issue because every contract starts from a different root :man_facepalming:

aakoshh avatar Sep 14 '22 15:09 aakoshh

Related to this, I'm looking at the part of the EVM storage layout docs where they illustrate the slots of a struct: So it looks like the data is stored under the hash of the slot

Note: Struct data is stored inline. I.e., a struct with 2 uint256 integers takes up two adjacent slots.

The EVM would hash these slots with the contract's address to get the location in the storage trie. However, we don't need to do that because we have a separate tree per contract.

The case you're looking hat has an array within an array. Array data is stored at the hash of the array's slot (where the array's length is stored in the slot itself).

I wouldn't try to get into the semantics of how Solidity stores data inside its complex types because:

We need to be EVM compatible, but we can try to optimize for expected layouts. We can also, in the future, add additional "layouts" that can be picked at deploy time.

Stebalien avatar Sep 14 '22 16:09 Stebalien

Okay, so we discussed in the engineering hudl what we might do in order to load more data at once from the IPLD store in anticipation that it will be useful.

There doesn't seem to be much we can do about mapping types because their keys are based on concatenation and hashing, but for arrays and structs the storage location is derived as a hash plus an offset, e.g. for x[i][j] type of x is uint24[][] it's keccak256(keccak256(p) + i) + floor(j / floor(256 / 24)). Ignoring the floor stuff we can see + i and + j being outside the hashes.

This means that arrays will indeed be in contiguous slots, with the initial hash based on the position of where they are declared ensuring they are spread apart :crossed_fingers: We can potentially use this to load a whole run of the items at once if we use a data structure such as a B-tree, with neighbouring items stored next to each other. Provided that they are also stored as one Block like that in the IPLD block store.

At first glance the HAMT also resembles something that could fit, since it's a prefix tree. There are the following problems with it:

  • It hashes keys, destroying their continuity. The reason is that these data structures benefit from not being too deep, and hashing randomizes the key, keeping the overall tree size O(log(n)). If we remove the hashing, then the array keys can stay next to each other.
  • However, there will be quite an imbalance when the number of items in an array reaches the maximum bucket size, which is currently 3. At that point the Values pointer (which stores KV pairs in the Node) will be replaced by a Link pointing at a new sub-node, and the values are added to that sub-node, only to fill the bucket again, and again, recursively all the way down until the first 248 bits are consumed from the key along 31 levels of Nodes. This is because the keys will look like 101...00, 101...01, 101...10, 101...11 etc, having a long common prefix. So, loading the items of the array will include following all 32 levels of Links in the IPLD store. At the lowest level, with a bitwidth of 8, there will be 256 items stored together.
  • The top level variables will be concentrated in the first bucket, becuase their slots are 0, 1, 2, etc, which in 256bit key terms mean they again have a long common prefix. This isn't necessarily a problem if we expect that they will be needed together.
  • Mapping keys will be randomly distributed because they are hashed. The problem with them could be that they will actually stay inside the root node, because there won't be enough items in the same bucket to cause them to sink into lower levels. That will mean that retrieving the root node will potentially read a relatively large amount of data that might never be needed. Possibly we could have a rule that on the 0th depth of the HAMT we don't store Pointer::Values.

Ostensibly we could do something like the Ethereum MPT to skip/extend levels if we know there will be no branching from the root all the way down to lowest level, and adjust if the array grows beyond the capacity of a single Node.

I will have a look at how a B*-tree would improve upon this.

And yes, this seems like optimising for this particular Solidity compiler strategy. But another way to look at it is just removing the suboptimality of the HAMT to better fit what most of the EVM bytecode will do.

aakoshh avatar Sep 14 '22 21:09 aakoshh

When working with the Cosmos SDK, developers have the option to iterate key ranges in the store. They use balanced IAVL+ trees:

  • https://docs.cosmos.network/master/core/store.html
  • https://github.com/cosmos/iavl/blob/master/docs/overview.md

There's also a Draft ADR to move to a Sparse Merkle Tree, which discusses some of the drawbacks of using the IAVL+ data structure for storage:

  • https://docs.cosmos.network/master/architecture/adr-040-storage-and-smt-state-commitments.html
  • https://paper.dropbox.com/published/State-commitments-and-storage-review-KeEB7eOd11pNrZvVtqUgL3h

They are proposing using an SMT for state commitments only, while keeping the actual state data in a KV store which supports iterations and snapshots - the hash of the keys and the values will be inserted into the SMT. But their case is slightly unique in that Tendermint consensus has instant finality, so they don't need to maintain immutable data structures like the HAMT that are simultaneously accessible, they can overwrite the data and rely on DB snapshots to go back if necessary. So a simple prefix in the keys is enough.

The SMT they are adopting is based on https://developers.diem.com/papers/jellyfish-merkle-tree/2021-01-14.pdf

aakoshh avatar Sep 15 '22 11:09 aakoshh

NEAR's Aurora also allows EVM transactions to run on the blockchain with Wasm. They kept both gas: https://doc.aurora.dev/compat/gas

The following could be relevant to gas costs calibration/estimation:

  • https://github.com/near/nearcore/tree/master/runtime/runtime-params-estimator
  • https://github.com/near/nearcore/labels/A-params-estimator
  • https://github.com/near/near-evm/tree/benchmark
  • https://github.com/near/nearcore/tree/master/runtime/near-test-contracts/estimator-contract
  • https://github.com/near/calibrator
  • https://docs.google.com/presentation/d/10nqplCDpmLdzeZbVx9POwBqEYH6aSFcZOgZPA2vMK_g/edit

Also found IOTA has a gascalibration CLI tool to generate some reports, using some of the EVM contracts here, exercised either in the CLI or in tests.

aakoshh avatar Sep 16 '22 12:09 aakoshh

I think the HAMT might be a fundamentally awkward data structure for on-disk storeage.

A B-Tree stores data in its leaves and ensures that traversal to those trees is logarithmic and predictable for all keys.

A HAMT will store full KeyValue pairs along the way as well in the nodes. These will be the entries I am not interested in, when I'm looking at a specific key, yet I pay for the cost of retrieving them at the least, and maybe writing them back if I made a change in the node I was looking for. Unlike the B-Tree, how much of this extra data I will encounter is completely unpredictable, making it difficult to reason about costs.

This is not a problem for in-memory representation which just bypasses these.

aakoshh avatar Sep 24 '22 06:09 aakoshh

Yeah, the unpredictable nature is definitely an issue. We "batch" like this because:

  • Lots of small blocks is inefficient in terms of round-trips and metadata.
  • We tend to have code that iterates over maps (and benefits from the
  • locality).

But where we have purely random writes, this leads to write amplification.

These will be the entries I am not interested in, when I'm looking at a specific key, yet I pay for the cost of retrieving them at the least, and maybe writing them back if I made a change in the node I was looking for.

I think any tree-based structure will have that issue, no?

Unlike the B-Tree, how much of this extra data I will encounter is completely unpredictable, making it difficult to reason about costs.

In the case of FEVM, this should be reasonably predictable. We can talk about maximum, minimum, and expected sizes.

This is not a problem for in-memory representation which just bypasses these.

Well, to be fare, having to decode and re-encode a bunch of unrelated keys/values is not free. Really, one of the unfortunate issues with the Rust HAMT implementation is that it can't lazily decode "adjacent" objects (the go implementation only decodes objects on first access).

Stebalien avatar Sep 24 '22 14:09 Stebalien

NEAR's Aurora also allows EVM transactions to run on the blockchain with Wasm. They kept both gas: doc.aurora.dev/compat/gas

This is because NEAR allows for payment of transactions with wrapped ETH bridged through their native Rainbow bridge. We don't model ETH because we are not contending with ETH, nor attempting to siphon value from the Ethereum network, nor attempting to serve as a replacement for the Ethereum network. That's why we decided to drop gas measurements that only make sense in an ETH-denominated economy/interaction.

raulk avatar Sep 26 '22 14:09 raulk

I think any tree-based structure will have that issue, no?

It wouldn't be the case if the tree only stored data in the leaves, because during traversal you would only retrieve data that helps you navigate the index to find the data you want. The HAMT has index and key-value lists on each level, and if you were looking for something not in the key-values, they were just retrieved as dead weight, unless there are many more reads and caching effects bring benefit.

In the case of FEVM, this should be reasonably predictable. We can talk about maximum, minimum, and expected sizes.

Certainly that kind of analysis is possible, if you know the history of the contract. It's not as simple as "height is logarithmic in the number of entries" because value size plays a role as well, but you are right, statistically the randomised keys will have predictability.

In any case I'm glad to hear you think that this isn't a problem for the FVM.

aakoshh avatar Sep 26 '22 15:09 aakoshh

It wouldn't be the case if the tree only stored data in the leaves, because during traversal you would only retrieve data that helps you navigate the index to find the data you want. The HAMT has index and key-value lists on each level, and if you were looking for something not in the key-values, they were just retrieved as dead weight, unless there are many more reads and caching effects bring benefit.

Ah, I see. We do that to avoid additional round-trips on read/write. But it definitely amplifies writes.

It would be interesting to try a version that doesn't put values in intermediates, but it might end up with a lot of small leaves.

Certainly that kind of analysis is possible, if you know the history of the contract. It's not as simple as "height is logarithmic in the number of entries" because value size plays a role as well, but you are right, statistically the randomised keys will have predictability.

In the case of the FEVM, we do know the sizes of the values (32 byte keys, 32 byte values).

In the general case, this requires manual tuning which is kind of unfortunate.

Stebalien avatar Sep 27 '22 01:09 Stebalien

Summary

Here's a summary of what has been happened so far in this epic across the various PRs, which can be found linked to the issues created to break up it up into smaller tasks.

Baseline

  • https://github.com/filecoin-project/builtin-actors/pull/697 contains the baseline measurements
  • using the smart contract created in https://github.com/filecoin-project/builtin-actors/pull/689

The contract contains a bunch of scenarios that are designed to highlight various inefficiencies we anticipated. Each scenario is rendered as a chart showing the bytes read/written during the contract call, and the number of calls to reads/writes. These show the footprint of the data structure and the configuration, but doesn't capture the cost because the time it takes to execute the tests are not captured, because everyone's machine is different. We'll do gas cost calibration separately.

The PR contains a description of each scenario and an analysis of what we see in the data.

Optimisations

The following are the new tweaks to the HAMT to make it work better with the Solidity compiler's storage layout. These are all incremental changes, so the charts show the delta of the improvements, they are not against the baseline (except the first). To see the final value, the measurement PRs are actually opened against the next-optimize-evm-859-storage-footprint branch; once we merge all of them, we can open a final PR against next which will show the effect against the baseline itself.

Usually there are two PRs:

  • one in the ref-fvm repo that changes the HAMT and
  • another in the builtin-actors repo that changes the configuration of the EVM actor, shows the updated charts, and contains some analysis to explain the changes in the patterns we see.

Every change is backwards compatible and all the options are disabled by default.

Measurements can be ran in the builtin-actors/src/actors/evm directory as follows (the updated charts will be in tests/measurements):

make all

Skip serialization when nothing changed

Noticed in the baseline that even pure reads write data to the store.

  • This has been fixed in https://github.com/filecoin-project/ref-fvm/issues/899 and
  • there's an equivalent issue open for the AMT https://github.com/filecoin-project/ref-fvm/issues/915

There are no measurements for this one, it was done together with the next point.

No hashing

There is only one PR here, just the EVM actor choosing a different hashing algorithm:

  • https://github.com/filecoin-project/builtin-actors/pull/730

This disables the hashing the HAMT expects to happen, and it showed how deep the tree became in the places where the array entries were.

Extension / skipping

  • https://github.com/filecoin-project/ref-fvm/pull/930 adds a use_extensions option to the HAMT to skip levels and point at nodes at further distance, to circumvent the deep paths introduced by no hashing.
  • https://github.com/filecoin-project/builtin-actors/pull/739 contains the updated measurements

Here we discussed that this change would be great to port to the AMT as well, but no issue has been created.

Minimum data depth

The previous point pretty much solved the issues with the arrays, next to look at were the mappings in Solidity. These are stored under random keys, but what was noticeable about them is that a lot of the entries end up in buckets of the root node, making it huge before it's sufficiently saturated that the values are pushed down into lower nodes. One idea to combat this was to try to keep the top levels of the HAMT for routing only.

  • https://github.com/filecoin-project/ref-fvm/pull/935 adds a min_data_depth option to the HAMT to not store data under certain levels
  • https://github.com/filecoin-project/builtin-actors/pull/740 shows the effect; after some experimentation a minimum depth of 2 seemed like a good compromise between lowering the amount of bytes vs the number of tree traversals.

Reduce bit width

After the last change, the root node is still quite big because the EVM key-value pair are not that much different from a CID and an extension in terms of storage requirement. The only difference is there can be 3 KV pairs in a bucket (something we could add to the configuration actually, currently it's hardcoded).

So far we have been using the default bit width of 8 which allows 256 slots in a node. This PR tries smaller bit widths to see how the amount of bytes read/written changes versus the depth of the tree:

  • https://github.com/filecoin-project/builtin-actors/pull/741

The PR description contains a link to a spreadsheet which contains average values for all options, which makes trade-offs easier to see. The value of 5, ie. a 32-ary tree, seemed like a better choice than 8, but 4 or 6 could also be good; 5 is actually an outlier because it only stores 2 buckets in the lowest nodes.

Fix HashBits::next

During testing different bit widths I ran into a bug, the tests that run the scenarios failed for bit width 6. This turned out to be a bug in the baseline HAMT, and a fix was submitted in

  • https://github.com/filecoin-project/ref-fvm/pull/945

The following PRs build on this fix, without it they don't work.

Push data down to leaves

During discussion of what data structure would be ideal I made a proposal of only storing key-value pairs in the leaves, which should disperse mapping entries from the root, and keep array entries together. This would make the HAMT work something like an unbalanced B-tree.

  • https://github.com/filecoin-project/ref-fvm/pull/946 add a push_data_into_leaves option to the HAMT to do just this. It builds on the extension feature to point from the root all the way to the leaf; these pointers will be split as more data is inserted, but the key-value pairs don't have to be copied unless the new entry is in the same node.
  • https://github.com/filecoin-project/builtin-actors/pull/745 shows the effect on the measurements.

The previous settings, use_extensions and min_data_depth work independently of each other: we can enable one, the other, or both. The new push_data_into_leaves does not compose with them; if enabled, the other two have no effect.

The measurments show that for the EVM enabling this over the combination of the previous two doesn't make sense because the key-value entries are so small that using extended pointers is actually worse.

The feature could still make sense for regular FVM contracts that store larger values, where the effect of having to copy all of them if anything changes in the node that contains them is more pronounced.

Different bit widths have a huge impact on the running time of tests. A bit width of 4 was found to be faster than 5 or 6; 8 was so slow I didn't even wait for it to finish. Still with 4 it took twice as long to run the measurement as it was with the previous configuration, so this feature is only for use cases with big entries in the HAMT.

Picking the best values

There is no one-size-fits-all configuration that performs optimally for every conceivable smart contract. It's worth considering that perhaps the user should be allowed to send some hints during contract deployment, based on their anticipated usage patterns and measurements they have done on their own contract.

aakoshh avatar Oct 11 '22 10:10 aakoshh

This is a great summary, thanks @aakoshh!

It's worth considering that perhaps the user should be allowed to send some hints during contract deployment, based on their anticipated usage patterns and measurements they have done on their own contract.

Do you have any examples of such hints in mind? I wonder whether we can realistically expect Solidity developers to know enough about internals of FEVM to provide these hints. Or do you think these hints could be high-level enough that they won't require too much knowledge of FEVM?

maciejwitowski avatar Oct 11 '22 14:10 maciejwitowski

@maciejwitowski by hints currently all I could think was the values for these settings.

How to obtain them is a different question. They would have to do something like (1) generate a series of representative transactions for their smart contract and (2) do a parameter grid search or more fancy optimisation technique to find a combination of parameters which minimises gas usage.

For this we'd have to provide an easy to use environment for them to run their contracts with full instrumentation with regards to gas tracking, which these PRs don't do at all.

We don't have to start with this, though. Perhaps a good test of whether this would be useful is to take a handful of real-world contracts and show that they would benefit from actually different settings. If we do the experiment ourselves and they all converge to the same values, then it's not worth the trouble of making developers do such tests on their own.

aakoshh avatar Oct 11 '22 14:10 aakoshh

Just came across an old thread about the cost of loading data from the store in the EVM, and it mentions that geth has an optimisation mechanism similar to what I mentioned about Cosmos that stores the data outside the state trie in the database itself, to achieve native lookup speeds. To deal with reorgs, they keep the diffs of the latest volatile block heights in memory.

aakoshh avatar Nov 02 '22 13:11 aakoshh

Yeah, this is somewhat interesting for us because we expose the merkle-tree to actors themselves.

  • We could (probably) keep the actor cache (top-level state-tree cache) between messages. To do that, we'd cache by state-root (i.e., map state-root-cid -> state-tree-cache).
  • We could keep an address cache using the same mechanism.
  • We can't do much else because much of the state-reading logic happens inside actors themselves.

Stebalien avatar Nov 15 '22 23:11 Stebalien

There's still more work we can do here, but nothing that's going to land right now. I believe that a simple optimization before the final release wouldn't require a FIP so we may have room for improvement during buildernet... but probably not.

Summary is:

  1. We're now have a significantly better worst-case for larger contracts.
  2. We're now ~38% slower for tiny contracts (a few state elements).

But I'm closing this issue out because all the work here has been completed.

Stebalien avatar Dec 07 '22 20:12 Stebalien