scout icon indicating copy to clipboard operation
scout copied to clipboard

API for EE caching

Open axic opened this issue 4 years ago • 14 comments

The current block data limit seems to be around 32-64k. This seems to be challenging given the work done by @s1na and @poemm.

A potential idea is to allow EE to keep some cache. (Not sure who's idea it was.)

Introduce two new EEI methods:

  • loadCache(dst_ptr: u32, offset: u32, len: u32)
  • saveCache(src_ptr: u32, offset: u32, len: u32)

where dst_ptr and src_ptr are memory pointers in the wasm instance and offset and len are properties of the cache space. There is a single linear cache space.

For the initial prototyping lets agree on a fixed-size cache space of 256k and a cache lifetime of 1 block. E.g. EEs can pass on data to the next block.

Potentially one may want to consider a cache expiry based on block time too.

An empty (fresh) or invalidated cache is pre-filled with zero bytes. EEs should leave a non-zero canary byte to determine if the cache is empty or set.

axic avatar Jul 25 '19 14:07 axic

Cache is similar to a state root -- persistent data stored with the contract instance. So perhaps loadCache() and saveCache() can be merged with eth2_loadPreStateRoot() and eth2_savePostStateRoot().

I imagined that each contract would buy some persistent cache, whether it be 0 bytes, 32 bytes for a state hash, or 64kB for whatever. This data would get mapped to the beginning of Wasm memory at call-time. And anything left at the beginning of Wasm memory would get saved as cache for the next run. Maybe this is too restrictive.

poemm avatar Jul 25 '19 15:07 poemm

A potential idea is to allow EE to keep some cache. (Not sure who's idea it was.)

This was proposed by @loredanacirstea in an ethresearch post.

Having the EEs store some state/do caching, I think is a decent compromise instead of having a separate "master shard" with stateless EEs.

Mikerah avatar Jul 25 '19 22:07 Mikerah

@poemm I don't think cache is similar to a state root. Cache is non-persistent storage, while the state root is persistent.

Extending the "stateroot" was discussed and the general understanding is that the cost of deploying a new EE could scale with the persistent storage it wants, but nothing larger than 128 bytes was discussed.

A non-presistent cache seems to be less invasive to the protocol. Not sure yet if it helps enough EEs to be worthwhile, but we should try it out.

axic avatar Jul 26 '19 11:07 axic

The main thing I wonder is how the transactions will be coordinated between blocks.

Assuming the cache is generated at block N, the transactions that intend to use the cache need to be executed in block N + 1. It seems like the BP for slot N + 1 could be bribed to not include the transaction(s) in that block -- essentially wasting the ether spent computing the cache in the first place -- without being penalized.

I think this can mostly be avoided by making the cache last more than a couple blocks, but that would increase the complexity. For example, when the shard committee is shuffled, does the new committee need to recompute the last X blocks to determine the cache?

I think it would be helpful for me to better understand the motivation (e.g. the challenges @s1na and @poemm are facing) behind a cache, and why it is infeasible to simply split the work into smaller chunks that can easily fit inside a block.

lightclient avatar Jul 31 '19 15:07 lightclient

the transactions that intend to use the cache need to be executed in block N + 1

In my opinion transactions shouldn't have a say in what will be cached. That would open the door to crafted transactions that make the cache useless like you described.

I think the idea is that the EE returns a data buffer to be cached in addition to the post state root. An idea by @cdetrio and @poemm is for example caching the top layers of the merkle tree, to reduce proof sizes. Reducing proof sizes and fitting in more txes in a block I'd say is the main motivation.

For example, when the shard committee is shuffled, does the new committee need to recompute the last X blocks to determine the cache?

Should be similar to how they recompute the post state root, I'd think, or is there an additional challenge for the cache?

s1na avatar Jul 31 '19 15:07 s1na

for example caching the top layers of the merkle tree

Ahhh, I see now! This makes a lot of sense then. I can't think of any issues with adding functionality.

Should be similar to how they recompute the post state root, I'd think, or is there an additional challenge for the cache?

You're right, I forgot about the lookahead they have on the next shard they'll be assigned to -- so it is essentially the same as computing the post state root.

lightclient avatar Jul 31 '19 16:07 lightclient

Regarding https://github.com/ewasm/scout/issues/18#issuecomment-515238741

My proposal was for a global cross-shard cache, to avoid the cross-shard transactions delay and cost. It seems that here the issue is per-shard/EE cache due to implementation restrictions on block size, when proofs need to be computed. So, the per-EE cache idea is not from me.

loredanacirstea avatar Aug 03 '19 20:08 loredanacirstea

I was experimenting with the cache API in scout.ts: https://github.com/ewasm/scout.ts/pull/3. Some thoughts:

  1. I couldn't come up with a clean way to both have a cache lifetime and a single linear cache space. If we want lifetimes we might have to have a slot-based cache: e.g. EE saves a hash to a cache slot in block i, that slot will be clear in block i + ttl.
  2. For a lifetime of 1, I worked around this by having a read-only and a write-only cache. loadCache reads from the read-only cache and saveCache saves to write-only cache which will get persisted to next block (will become next block's read-only cache).
  3. You can build persistent storage on top of an expiring cache: every time simply copy everything from previous block's cache to next block's cache. I'm still confused as to why an expiring cache would be less invasive to the protocol than a fixed-size persistent storage.

s1na avatar Aug 14 '19 14:08 s1na

3. You can build persistent storage on top of an expiring cache: every time simply copy everything from previous block's cache to next block's cache. I'm still confused as to why an expiring cache would be less invasive to the protocol than a **fixed-size** persistent storage.

Agreed, persistent cache can be built on temporary cache. Also, temporary cache can be built on persistent cache. So I think that we should choose one, and let apps build whatever they need on top of it. I vote for persistent cache because I believe that it is cleaner.

I am still hoping that the first response in this thread gets support -- that cache is handled just like the 32-byte stateRoot, but it is always mapped to the beginning of memory.

One disagreement, fixed-size is fine at first, but I vote to eventually add an interface for growing and shrinking cache.

poemm avatar Aug 14 '19 17:08 poemm

that cache is handled just like the 32-byte stateRoot, but it is always mapped to the beginning of memory.

To be honest I think this is not a great idea as it would require compiler support for every language people want to write EEs in to support reservation of memory areas.

axic avatar Aug 15 '19 11:08 axic

I also think that calling it a "cache" is wrong when it is persistent. A cache can be invalidated and thrown away and the system should be able to recover (by regenerating the cache), while a persistent storage cannot be thrown away.

If we are talking about extending a persistent storage, just call it storage and not cache.

axic avatar Aug 15 '19 11:08 axic

To be honest I think this is not a great idea as it would require compiler support for every language people want to write EEs in to support reservation of memory areas.

OK, it is possible that this is not a great idea. But I wish to clarify some things.

  • We already have persistent storage which writes to memory at call-time -- Wasm data segments. This would make them mutable. Of course, implementations should have freedom to implement it without using data segments.
  • For language support, many compilers like LLVM already have flags to choose where to start allocating to memory.
  • Persistent storage mapped to memory removes the need for load*() and save*() host functions. Edit: it may be useful to have a laodFromAccount(address,start,length).
  • A poster above does not see how expiring cache would be less invasive to the protocol than persistent storage.

poemm avatar Aug 15 '19 12:08 poemm

A poster above does not see how expiring cache would be less invasive to the protocol than persistent storage.

Who is that poster above?

axic avatar Aug 15 '19 15:08 axic

Who is that poster above?

@s1na wrote:

3. You can build persistent storage on top of an expiring cache: every time simply copy everything from previous block's cache to next block's cache. I'm still confused as to why an expiring cache would be less invasive to the protocol than a **fixed-size** persistent storage.

poemm avatar Aug 15 '19 17:08 poemm