solidity
solidity copied to clipboard
Disallow ``codecopy`` in pure functions (and check for other cases)
Related to https://github.com/ethereum/solidity/issues/8153 and following https://github.com/ethereum/solidity/pull/12256 we should make sure we actually are as strict as we want to be for pure functions.
In particular, we should disallow codecopy with 0.9.0, but we should double-check if there is other cases that we should strengthen as well.
Isn't disallowing access to msg.data also part of this issue?
I mean, I can also just do it right away, maybe that's safest for not missing it again.
Isn't disallowing access to
msg.dataalso part of this issue?
If we agree that we should, then yes, absolutely :-)! I'm all for that, but I wasn't sure we had consensus about that.
Hm... I'm just looking through the instruction list... and hit CALLDATALOAD :-)...
That one we can't really disallow I guess... but if we allow it, we might as well keep msg.data...
The problem is still that "externally pure" is different from "internally pure". An external function call I can compile-time evaluate, even if it involves msg.data and calldataload - but an internal one I cannot...
Well basically external pure can have different rules, than private/public/internal. We had another issue for tracking the memory-mutability of pure, that should be also revived.
Hm, yeah... I actually thought we couldn't just use different rules due to public being both internal and external, but it actually makes perfect sense to apply the stricter internal rules to public...
Anyways, I pushed #12261 for the obvious cases I saw that shouldn't be pure, i.e. codecopy and codesize - the rest we can do, once we decide if to split external and internal analysis or what else to do.
What about internal functions that have calldata parameters? Can they use inline assembly and thus calldataload to access their parameters?
Yeah - which is a problem. If we allow loading something from calldata, we effectively allow loading anything from calldata and thus might as well keep msg.data pure...
Then again, I'm not sure if we should ever even try to actually compile-time-evaluate inline assembly anyways...
So maybe restricting pure to the notion of "external pure" only is enough (i.e. disallow accessing code, but generally allowing to access calldata however one likes, including via msg.data) - and whether an internal pure function can be compile-time evaluated we can have the compiler decide without a special syntactic marker... but I'm not sure...
but we should double-check if there is other cases that we should strengthen as well.
I went through the list of documented opcodes while testing #12861. The only opcodes that seem relevant to this issue are:
returndatasize()(disallowed after #12861)returndatacopy()(disallowed after #12861)calldataload()(discussed above)calldatasize()(discussed above)calldatacopy()(discussed above)codesize()(disallowed after #12261)codecopy()(disallowed after #12261)
So I think we're done here. Can we close this issue now?
Clarification on the status of this issue: since #12261 was only merged into the breaking branch and it's uncertain what will happen to that branch, this is technically still not done.
Since the concept of pure does not exist at the EVM level, I have always wondered what specific problem the pure keyword in Solidity is intended to solve. In practice, the compiler primarily suggests toggling between view and pure (or vice versa) without providing deeper insights into the underlying benefits. If the objective is to prohibit access to EVM state, calldata, or code, the current implementation still falls short, as it restricts valuable precompiles such as ecrecover() and sha256(), which demonstrate genuinely pure behavior and should not be limited.
I propose redefining pure to signify that, given the same inputs, the function consistently produces the same output and exhibits no side effects (i.e., it is deterministic and side-effect-free), even when called from different contexts. Furthermore, access to calldata should be restricted in pure functions to enforce that behavior depends solely on explicit function arguments, not on the entire calldata or its slices. Regarding this and code(this), it should be excluded only if the goal is to allow delegatecall to pure functions while retaining their properties, though I'm uncertain if this is a good idea.
I propose introducing special handling for external "purecall" constructs directly at the Solidity compiler level, since no equivalent opcode or mechanism exists natively in the EVM to distinguish purely deterministic external invocations from general staticcalls. This would involve the compiler recognizing and validating calls to known precompile addresses as inherently pure, allowing them within pure functions without violating purity guarantees, as long as they remain side-effect-free and deterministic. All standard Ethereum precompiles qualify for this treatment instead of being restricted to staticcall semantics: these include ECRecover (0x01), SHA256 (0x02), RIPEMD160 (0x03), Identity (0x04), ModExp (0x05), BN256Add (0x06), BN256ScalarMul (0x07), BN256Pairing (0x08), and Blake2f (0x09), each of which processes inputs deterministically without accessing or modifying blockchain state.
Summarizing, 'pure` behavior could be explained like this:
A
purefunction produces the same output for the same inputs and exhibits no side effects, even when called from different contexts.
| Aspect | Current Pure (Solidity 0.8.x) | Proposed Pure |
|---|---|---|
| this | ✗ | ✓ |
| storage | ✗ | ✗ |
| calldata | ✓ | ✗ |
| code(this) | ✓ | ✓ |
| code(address) | ✗ | ✗ |
| purecall | (no current behavior) | ✓ |
| staticcall | ✓ | ✗ |
| memory | ✓ | ✓ |
| transient | ✗ | ✗ |
| block.* | ✗ | ✗ |
| tx.* | ✗ | ✗ |
| returndata | ✗ | only after purecall |
I have always wondered what specific problem the pure keyword in Solidity is intended to solve.
AFAIK there never was a fully consistent vision for it. It has always been a bit schizophrenic between "not reading storage" and "compile-time constant". What you're describing is one way we could do it I guess. What do you think about going all the way in the other direction towards compile-time evaluation though (https://github.com/argotorg/solidity/issues/3157#issuecomment-3293684297)? I mean, we will introduce compile-time evaluation of user-defined functions at some point anyway. The question would be whether that should replace pure or be a separate, stricter level below it.