ethermint icon indicating copy to clipboard operation
ethermint copied to clipboard

adr: draft ADR for stateful precompiled contracts

Open yihuang opened this issue 2 years ago • 15 comments

ref: #1116

Description

Example implementation: https://github.com/yihuang/ethermint/tree/precompiled

IBC example:


For contributor use:

  • [ ] Targeted PR against correct branch (see CONTRIBUTING.md)
  • [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • [ ] Code follows the module structure standards.
  • [ ] Wrote unit and integration tests
  • [ ] Updated relevant documentation (docs/) or specification (x/<module>/spec/)
  • [ ] Added relevant godoc comments.
  • [ ] Added a relevant changelog entry to the Unreleased section in CHANGELOG.md
  • [ ] Re-reviewed Files changed in the Github PR explorer

For admin use:

  • [ ] Added appropriate labels to PR (ex. WIP, R4R, docs, etc)
  • [ ] Reviewers assigned
  • [ ] Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)

yihuang avatar Jun 16 '22 09:06 yihuang

This is awesome! But collides a bit with the designed proposed in https://github.com/evmos/ethermint/pull/1272 with a more modular proposition. We should try to decide towards one model and work together towards that model. If we agree the other proposal is the way to go we could either close this or adapt it to meet the modular design. Open for discussion 🙏

facs95 avatar Sep 01 '22 21:09 facs95

This is awesome! But collides a bit with the designed proposed in #1272 with a more modular proposition. We should try to decide towards one model and work together towards that model. If we agree the other proposal is the way to go we could either close this or adapt it to meet the modular design. Open for discussion 🙏

I think the difference is mainly stateful vs stateless, we can rework this one after the stateless part is done.

yihuang avatar Sep 02 '22 01:09 yihuang

This draft ADR is similar to the work I did back in February 2022:

go-ethereum

https://github.com/loredanacirstea/go-ethereum/commit/abd62c321076e4b94438896bf1353affe4b641a7

  1. Changes for making precompiles dynamic:
  • add a func getPrecompiles(chainRules params.Rules) map[common.Address]PrecompiledContract function, used to instantiate the precompiles in NewEVM
  1. Changes for making precompiles stateful
  • adding evm *EVM, caller ContractRef information when running a precompile - see RunPrecompiledContract

Changes 1 and 2 were mostly isolated in https://github.com/loredanacirstea/go-ethereum/commit/abd62c321076e4b94438896bf1353affe4b641a7.

  1. Other changes for advanced precompiles that are original work
  • CallWithState, RunWithInitialState - for the interpreter precompile
  • giving access to the internal chain state, for introspection precompile

ethermint

Then, ethermint could just use the NewEVM constructor with custom precompiles https://github.com/loredanacirstea/ethermint/commit/5a8c3a7a6469ad6f8a8d3df84afce7ce989407ac

But I think @fedekunze's approach is superior because it does not require go-ethereum changes. So, we don't need to expend effort in supporting a geth fork or in convincing the geth team to change their source.

loredanacirstea avatar Sep 15 '22 08:09 loredanacirstea

This draft ADR is similar to the work I did back in February 2022:

Yeah, It's directly inspired by your work, the main development is to make stateful precompiles work with statedb snapshot and revert.

yihuang avatar Sep 15 '22 08:09 yihuang

@yihuang We should define what stateless and stateful precompiles are. Because I see several categories:

  1. Stateless - as in they do not need to keep any precompile-specific state, nor do they need other state than what they receive as direct input. The equivalent of pure functions in Solidity.
  2. State-viewing - precompiles do not have their own state, but they use internal state (e.g. introspection precompile, where you can get info about transactions/blocks/events). E.g. view functions in Solidity
  3. Stateful - precompiles that need to keep their own internal state (even if temporary)
  4. State-changing - precompiles that change the global state (EVM sstore, etc.). E.g. mutable/non-payable/payable functions in Solidity
  5. [update] transaction initiator - precompiles that initiate transactions (e.g. EVM transactions through the account abstraction precompile that I developed)

loredanacirstea avatar Sep 15 '22 08:09 loredanacirstea

@yihuang We should define what stateless and stateful precompiles are. Because I see several categories:

  1. Stateless - as in they do not need to keep any precompile-specific state, nor do they need other state than what they receive as direct input. The equivalent of pure functions in Solidity.
  2. State-viewing - precompiles do not have their own state, but they use internal state (e.g. introspection precompile, where you can get info about transactions/blocks/events). E.g. view functions in Solidity
  3. Stateful - precompiles that need to keep their own internal state (even if temporary)
  4. State-changing - precompiles that change the global state (EVM sstore, etc.). E.g. mutable/non-payable/payable functions in Solidity

Hmm, what I mean here is the contract modifies the global state, specifically, modifies the cosmos-sdk states, like bank transfers, or sending ibc packets.

yihuang avatar Sep 15 '22 08:09 yihuang

@yihuang @loredanacirstea. All of the above makes sense, I've been doing some research into getting state changing precompile working and ultimately am realizing that the StateDB is the problem.

IMO we need to find a way to make it so that an EVM txn reverting reverts the Cosmos txn as well. The design decision to make a reverting EVM transaction a successful cosmos transaction is a huge antipattern in my opinion, and is the root of the StateDB commit() revert() issue.

itsdevbear avatar Sep 22 '22 16:09 itsdevbear

@yihuang @loredanacirstea. All of the above makes sense, I've been doing some research into getting state changing precompile working and ultimately am realizing that the StateDB is the problem.

IMO we need to find a way to make it so that an EVM txn reverting reverts the Cosmos txn as well. The design decision to make a reverting EVM transaction a successful cosmos transaction is a huge antipattern in my opinion, and is the root of the StateDB commit() revert() issue.

This ADR proposed a solution to this problem, with examples.

yihuang avatar Sep 22 '22 16:09 yihuang

@yihuang correct. But the issue is that we can't call arbitrary module logic, in the way that we can pass CosmosMsg's from CosmWasm VM to the module directly. Going this route, would required duplicating logic that is possibly already defined in it's keeper.

itsdevbear avatar Sep 22 '22 16:09 itsdevbear

@yihuang correct. But the issue is that we can't call arbitrary module logic, in the way that we can pass CosmosMsg's from CosmWasm VM to the module directly. Going this route, would required duplicating logic that is possibly already defined in it's keeper.

yes, we have to carefully manage temporary states in memory. cosmos-sdk do have a CacheContext, which I don't think it can play well with StateDB, and deeply nested CacheContext perform very bad.

yihuang avatar Sep 22 '22 17:09 yihuang

Seems like bad developer UX, there has to be a way to get it to work like how CosmWasm works. That should be the end goal for Stateful Precompiles IMO.

itsdevbear avatar Sep 22 '22 17:09 itsdevbear

Having an EthTx revert, revert the whole Cosmos Txn would resolve this issue, no?

itsdevbear avatar Sep 22 '22 17:09 itsdevbear

Seems like bad developer UX, there has to be a way to get it to work like how CosmWasm works. That should be the end goal for Stateful Precompiles IMO.

I think CosmWasm call native functionalities by passing async messages, similar to how we do the log hooks currently.

yihuang avatar Sep 22 '22 17:09 yihuang

Having an EthTx revert, revert the whole Cosmos Txn would resolve this issue, no?

The exception revert can happen in nested way.

yihuang avatar Sep 22 '22 17:09 yihuang

Needless to say, I think the approach should be attempt to get synchronous ability to call Cosmos SDK modules from the evm. Having to duplicate all the logic seems like a massive way to introduce issues.

itsdevbear avatar Sep 22 '22 17:09 itsdevbear

@yihuang correct. But the issue is that we can't call arbitrary module logic, in the way that we can pass CosmosMsg's from CosmWasm VM to the module directly. Going this route, would required duplicating logic that is possibly already defined in it's keeper.

yes, we have to carefully manage temporary states in memory. cosmos-sdk do have a CacheContext, which I don't think it can play well with StateDB, and deeply nested CacheContext perform very bad.

I read your implementation carefully and think that if you use CacheContext to manage the state of the precompiled contract, it can greatly reduce the difficulty of its development. As for the nesting problem you mentioned, I don't understand how it is generated. If we use independent execution of each transaction CacheContext (and only one is provided when initializing the precompiled contract), and writing to the cache when the transaction is completed, will there still be a nesting problem(Maybe I don't understand enough)? For example like this:

func (k *Keeper) ApplyTransaction(ctx sdk.Context, tx *ethtypes.Transaction) (*types.MsgEthereumTxResponse, error) {
   // other logic
        contractAddresses := make(map[common.Address]vm.PrecompiledContract, len(k.precompiledContractCreators))
	    if len(k.precompiledContractCreators) == 0 {
		    return contractAddresses
	    }

	cacheCtx, writeCache := ctx.CacheContext()
	k.writeCache = writeCache
	for key, create := range k.precompiledContractCreators {
		contractAddresses[key] = create(cacheCtx)
	}
     //TODO: then inject contractAddresses into go-etherum's PrecompiledContract global variable

        // call contract
        ret, leftoverGas, vmErr = evm.Call(sender, *msg.To(), msg.Data(), leftoverGas, msg.Value())
    
        if vmErr == nil && k.writeCache != nil {
		    k.writeCache()
		    k.writeCache = nil
        }

 //

}

But there is a problem with this, the caller cannot be obtained in the precompiled contract (origin can be injected using CacheContext)

dreamer-zq avatar Oct 21 '22 02:10 dreamer-zq

@dreamer-zq we are playing with maintaing a new CacheCtx per precompile call and then commiting it on successful return of the precompile.

This localized CacheCtx gets commited to said first CacheCtx that represents the state fo the chain prior to anything occuring.

Then at the end of everything, we can check to see if anything reverted and then just jump back to the original Cache, without the use of nesting.

Should have a working MVP soon.

itsdevbear avatar Oct 21 '22 02:10 itsdevbear

@dreamer-zq we are playing with maintaing a new CacheCtx per precompile call and then commiting it on successful return of the precompile.

This localized CacheCtx gets commited to said first CacheCtx that represents the state fo the chain prior to anything occuring.

Then at the end of everything, we can check to see if anything reverted and then just jump back to the original Cache, without the use of nesting.

Should have a working MVP soon.

Does it work with nested message call? Say tx calls contract1, contract1 calls contract2, contract2 calls precompile, precompile success, but the contract2 call failed eventually, contract1 recovered the error and do something else, can we revert what the precompile modifies without reverting the whole tx?

yihuang avatar Oct 21 '22 04:10 yihuang

Does it work with nested message call? Say tx calls contract1, contract1 calls contract2, contract2 calls precompile, precompile success, but the contract2 call failed eventually, contract1 recovered the error and do something else, can we revert what the precompile modifies without reverting the whole tx?

Why not rollback all operations? No matter how many calls between contracts are made internally, for the user, these are all one transaction, they should all be the same context, why only part of the data is rolled back?

dreamer-zq avatar Oct 21 '22 05:10 dreamer-zq

Does it work with nested message call? Say tx calls contract1, contract1 calls contract2, contract2 calls precompile, precompile success, but the contract2 call failed eventually, contract1 recovered the error and do something else, can we revert what the precompile modifies without reverting the whole tx?

Why not rollback all operations? No matter how many calls between contracts are made internally, for the user, these are all one transaction, they should all be the same context, why only part of the data is rolled back?

That's how EVM supposed to works, let me show a more concrete example:

contract Caller {
  Callee addr;
  uint state;
  function test() {
    (bool success, bytes memory data) = addr.call(abi.encodeWithSignature("callNative()"));
    if !success {
      // ignore the error and move on
      state = 1;
    }
  }
}

contract Callee {
  function callNative() {
    precompile.BankSend(...);
    require(false, "for fun");
  }
}

So we call Caller.test(), it calls Callee.callNative(), which calls precompile.BankSend, the precompile success, but callNative fail after, so the whole callNative call should be reverted, but Caller can ignore the error and move on to modify some other states, even call other precompiles, those modifications should stay, and the tx as a whole is a success one. And these kind of message calls can be nested deeper.

yihuang avatar Oct 21 '22 06:10 yihuang

Does it work with nested message call? Say tx calls contract1, contract1 calls contract2, contract2 calls precompile, precompile success, but the contract2 call failed eventually, contract1 recovered the error and do something else, can we revert what the precompile modifies without reverting the whole tx?

Why not rollback all operations? No matter how many calls between contracts are made internally, for the user, these are all one transaction, they should all be the same context, why only part of the data is rolled back?

That's how EVM supposed to works, let me show a more concrete example:

contract Caller {
  Callee addr;
  uint state;
  function test() {
    (bool success, bytes memory data) = addr.call(abi.encodeWithSignature("callNative()"));
    if !success {
      // ignore the error and move on
      state = 1;
    }
  }
}

contract Callee {
  function callNative() {
    precompile.BankSend(...);
    require(false, "for fun");
  }
}

So we call Caller.test(), it calls Callee.callNative(), which calls precompile.BankSend, the precompile success, but callNative fail after, so the whole callNative call is reverted, but Caller can ignore the error and move on to modify some other states, even call other precompiles, those modifications should stay, and the tx as a whole is a success one.

Thank you for your explanation, I understand what you mean. But this error should be caused by human, unless there is such a need

dreamer-zq avatar Oct 21 '22 07:10 dreamer-zq

But this error should be caused by human, unless there is such a need

It's not a human error, contracts call each other all the time, that how the defi legos works, we need to conform to the evm semantics, otherwise the contract won't work as expected.

yihuang avatar Oct 21 '22 07:10 yihuang

But this error should be caused by human, unless there is such a need

It's not a human error, contracts call each other all the time, that how the defi legos works, we need to conform to the evm semantics, otherwise the contract won't work as expected.

The logic of evm is indeed like this, if we follow this logic, then it means that the go-etherum code must be modified

dreamer-zq avatar Oct 21 '22 09:10 dreamer-zq

@dreamer-zq we need to handle the case where someone calls the precompile with a try catch, and we need to revert the execution of the current contract call, but the overall txn may still succeed.

itsdevbear avatar Oct 21 '22 14:10 itsdevbear

@CalBera is working on it on our side, I'll allow him to chime in.

itsdevbear avatar Oct 21 '22 14:10 itsdevbear

@dreamer-zq we need to handle the case where someone calls the precompile with a try catch, and we need to revert the execution of the current contract call, but the overall txn may still succeed.

thank you very much, I understand

dreamer-zq avatar Oct 24 '22 01:10 dreamer-zq

@dreamer-zq we need to handle the case where someone calls the precompile with a try catch, and we need to revert the execution of the current contract call, but the overall txn may still succeed.

@itsdevbear how long will it take to complete, if needed, we can also participate

dreamer-zq avatar Oct 24 '22 03:10 dreamer-zq

@yihuang @dreamer-zq I've proposed an approach to use CacheContext and still handle nested reverts. Essentially, by implementing the cache contexts in the statedb journals, cache contexts can follow the snapshot/revert logic properly, even in nested contract calls/reverts! Please take a look at this doc, which provides further explanation on this approach.

A first attempt Proof of Concept implementation can be found in this PR, notable files include: x/evm/statedb/interfaces.go, x/evm/statedb/statedb.go, x/evm/vm/geth/geth.go, and x/evm/types/precompiles.go. Note: this PR works on the latest version of geth without modifications, but requires shadowing many methods.

calbera avatar Oct 28 '22 19:10 calbera

@yihuang @dreamer-zq I've proposed an approach to use CacheContext and still handle nested reverts. Essentially, by implementing the cache contexts in the statedb journals, cache contexts can follow the snapshot/revert logic properly, even in nested contract calls/reverts! Please take a look at this doc, which provides further explanation on this approach.

A first attempt Proof of Concept implementation can be found in this PR, notable files include: x/evm/statedb/interfaces.go, x/evm/statedb/statedb.go, x/evm/vm/geth/geth.go, and x/evm/types/precompiles.go. Note: this PR works on the latest version of geth without modifications, but requires shadowing many methods.

"The max depth of caches is equal to the number of precompile calls per transaction" awesome, this should be much better than before when we nest cache context for each message call.

I see one gotcha here, the dirty account and balance information stored in statedb can't be accessed by the native calls.

yihuang avatar Oct 29 '22 00:10 yihuang

@yihuang this should be easily solvable since within a single ethereum transaction, the only way that data will be accessed that is not in x/evm is through precompiles and/or native balances, and block.coinbase (I may be missing something).

So I think what we need to do is ensure that whenever x/evm calls into another cosmos module (i.e, block.coinbase touches the staking keeper, we need to ensure that that call is also using the latest CacheContext.

TLDR use the latest CacheContext EVERYWHERE in x/evm not just precompiles

@calbera

itsdevbear avatar Oct 29 '22 19:10 itsdevbear