solidity icon indicating copy to clipboard operation
solidity copied to clipboard

Disallow ``codecopy`` in pure functions (and check for other cases)

Open ekpyron opened this issue 4 years ago • 10 comments

Related to https://github.com/ethereum/solidity/issues/8153 and following https://github.com/ethereum/solidity/pull/12256 we should make sure we actually are as strict as we want to be for pure functions.

In particular, we should disallow codecopy with 0.9.0, but we should double-check if there is other cases that we should strengthen as well.

ekpyron avatar Nov 08 '21 16:11 ekpyron

Isn't disallowing access to msg.data also part of this issue?

chriseth avatar Nov 08 '21 16:11 chriseth

I mean, I can also just do it right away, maybe that's safest for not missing it again.

ekpyron avatar Nov 08 '21 16:11 ekpyron

Isn't disallowing access to msg.data also part of this issue?

If we agree that we should, then yes, absolutely :-)! I'm all for that, but I wasn't sure we had consensus about that.

ekpyron avatar Nov 08 '21 16:11 ekpyron

Hm... I'm just looking through the instruction list... and hit CALLDATALOAD :-)... That one we can't really disallow I guess... but if we allow it, we might as well keep msg.data...

The problem is still that "externally pure" is different from "internally pure". An external function call I can compile-time evaluate, even if it involves msg.data and calldataload - but an internal one I cannot...

ekpyron avatar Nov 08 '21 16:11 ekpyron

Well basically external pure can have different rules, than private/public/internal. We had another issue for tracking the memory-mutability of pure, that should be also revived.

axic avatar Nov 08 '21 16:11 axic

Hm, yeah... I actually thought we couldn't just use different rules due to public being both internal and external, but it actually makes perfect sense to apply the stricter internal rules to public...

Anyways, I pushed #12261 for the obvious cases I saw that shouldn't be pure, i.e. codecopy and codesize - the rest we can do, once we decide if to split external and internal analysis or what else to do.

ekpyron avatar Nov 08 '21 17:11 ekpyron

What about internal functions that have calldata parameters? Can they use inline assembly and thus calldataload to access their parameters?

chriseth avatar Nov 09 '21 08:11 chriseth

Yeah - which is a problem. If we allow loading something from calldata, we effectively allow loading anything from calldata and thus might as well keep msg.data pure... Then again, I'm not sure if we should ever even try to actually compile-time-evaluate inline assembly anyways...

So maybe restricting pure to the notion of "external pure" only is enough (i.e. disallow accessing code, but generally allowing to access calldata however one likes, including via msg.data) - and whether an internal pure function can be compile-time evaluated we can have the compiler decide without a special syntactic marker... but I'm not sure...

ekpyron avatar Nov 09 '21 09:11 ekpyron

but we should double-check if there is other cases that we should strengthen as well.

I went through the list of documented opcodes while testing #12861. The only opcodes that seem relevant to this issue are:

  • returndatasize() (disallowed after #12861)
  • returndatacopy() (disallowed after #12861)
  • calldataload() (discussed above)
  • calldatasize() (discussed above)
  • calldatacopy() (discussed above)
  • codesize() (disallowed after #12261)
  • codecopy() (disallowed after #12261)

So I think we're done here. Can we close this issue now?

cameel avatar Mar 25 '22 19:03 cameel

Clarification on the status of this issue: since #12261 was only merged into the breaking branch and it's uncertain what will happen to that branch, this is technically still not done.

cameel avatar Apr 10 '25 17:04 cameel

Since the concept of pure does not exist at the EVM level, I have always wondered what specific problem the pure keyword in Solidity is intended to solve. In practice, the compiler primarily suggests toggling between view and pure (or vice versa) without providing deeper insights into the underlying benefits. If the objective is to prohibit access to EVM state, calldata, or code, the current implementation still falls short, as it restricts valuable precompiles such as ecrecover() and sha256(), which demonstrate genuinely pure behavior and should not be limited.

I propose redefining pure to signify that, given the same inputs, the function consistently produces the same output and exhibits no side effects (i.e., it is deterministic and side-effect-free), even when called from different contexts. Furthermore, access to calldata should be restricted in pure functions to enforce that behavior depends solely on explicit function arguments, not on the entire calldata or its slices. Regarding this and code(this), it should be excluded only if the goal is to allow delegatecall to pure functions while retaining their properties, though I'm uncertain if this is a good idea.

I propose introducing special handling for external "purecall" constructs directly at the Solidity compiler level, since no equivalent opcode or mechanism exists natively in the EVM to distinguish purely deterministic external invocations from general staticcalls. This would involve the compiler recognizing and validating calls to known precompile addresses as inherently pure, allowing them within pure functions without violating purity guarantees, as long as they remain side-effect-free and deterministic. All standard Ethereum precompiles qualify for this treatment instead of being restricted to staticcall semantics: these include ECRecover (0x01), SHA256 (0x02), RIPEMD160 (0x03), Identity (0x04), ModExp (0x05), BN256Add (0x06), BN256ScalarMul (0x07), BN256Pairing (0x08), and Blake2f (0x09), each of which processes inputs deterministically without accessing or modifying blockchain state.

Summarizing, 'pure` behavior could be explained like this:

A pure function produces the same output for the same inputs and exhibits no side effects, even when called from different contexts.

Aspect Current Pure (Solidity 0.8.x) Proposed Pure
this
storage
calldata
code(this)
code(address)
purecall (no current behavior)
staticcall
memory
transient
block.*
tx.*
returndata only after purecall

k06a avatar Sep 01 '25 18:09 k06a

I have always wondered what specific problem the pure keyword in Solidity is intended to solve.

AFAIK there never was a fully consistent vision for it. It has always been a bit schizophrenic between "not reading storage" and "compile-time constant". What you're describing is one way we could do it I guess. What do you think about going all the way in the other direction towards compile-time evaluation though (https://github.com/argotorg/solidity/issues/3157#issuecomment-3293684297)? I mean, we will introduce compile-time evaluation of user-defined functions at some point anyway. The question would be whether that should replace pure or be a separate, stricter level below it.

cameel avatar Sep 15 '25 20:09 cameel