solidity icon indicating copy to clipboard operation
solidity copied to clipboard

Language feature: disallow state-changing effects after an external call by default

Open pcaversaccio opened this issue 3 years ago • 37 comments

Abstract

Generally disallowing state-changing effects after an external function call and enabling the possibility to mark functions that specifically do this.

Motivation

I started this discussion on Twitter after another reentrancy attack (Cc: @chriseth). Reentrancy attacks are ubiquitous and even though there are toolings available (e.g. Slither) that conduct a static analysis it requires an initial setup as well as a proper understanding of how to interpret the results. In order to make the Solidity development more secure and sustainable I feel it's time to finally introduce such a language feature that disallows state-changing effects after an external function call and mark functions that specifically do this.

Specification

All state-changing effects after an external function call are disallowed. If you want to allow however such a possibility, we introduce a new modifier unprotected whose default value is false.

Backwards Compatibility

The code that previously compiled fine will not compile anymore if there is a state-change effect after an external call and the new modifier unprotected is unknown, so it's a breaking change.

pcaversaccio avatar May 10 '22 08:05 pcaversaccio

Just a comment on the name of the modifier: I don't like opinionated words like "safe" or "protected" because they give a false impression of safety or protection, we should find a more neutral one.

Also more generally: Also wrt. free functions, it might be a good idea to explicitly pass/provide something like a "context" as a parameter to functions that allows external calls (or maybe even state changes without external calls). In this case, we could have a special property on this context that allows state changes after an external call. By default, the context would transform into a "constant state" context after an external call - there are the linear types again.

Also something to consider more specifically to this issue: Is it OK to do another external call after the first external call? What about a delegatecall? Even if we do not use delegatecall, how do we distinguish a "data contract" that is associated to the current contract from a potentially malicious external contract?

After all these questions, I'm more and more inclined that this should rather be "off-loaded" to user code and the compiler should instead provide the required features for the type system.

chriseth avatar May 10 '22 08:05 chriseth

You are right about the name of the modifier - after thinking through it today maybe we even do not need a modifier but can use the unchecked block syntax directly. I think in a broader sense this language feature is consistent with the checked arithmetic introduced in solc version 0.8.0 since it also wants to prevent unwanted behaviour that could be exploited (one of the major reasons behind introducing libraries such as SafeMath were smart contract exploits based on overflow/underflow vulnerabilities).

There are three types of reentrancy:

  • Single Function Reentrancy
  • Cross-Function Reentrancy
  • Cross-Contract Reentrancy

For the cross-contract reentrancy, this one can happen when a state from one contract is used in another contract, but that state is not fully updated before getting called.

The conditions required for the cross-contract reentrancy to be possible are as follows:

  • The execution flow can be controlled by the attacker to manipulate the contract state.
  • The value of the state in the contract is shared or used in another contract.

The best outcome for this language feature would be that one could write the contracts without bothering about these three types by default. Of course, we can always argue to "offload" all the responsibility to the developers but past experience and events showcase blatantly that this does not work as intended.

Coming back to your context idea - I really like it but couldn't we simply achieve that via the unchecked block syntax. I.e. an unchecked block does not only not control for an unrestricted integer that falls outside the range of the result type but also allows for a "non-constant state". By default, however, the context is a "constant state".

Also something to consider more specifically to this issue: Is it OK to do another external call after the first external call? What about a delegatecall? Even if we do not use delegatecall, how do we distinguish a "data contract" that is associated to the current contract from a potentially malicious external contract?

I'm not yet sure what would be the best design here tbh.

pcaversaccio avatar May 10 '22 20:05 pcaversaccio

I would like to quickly add some further thoughts behind this language feature: The main question is whether we should build the language in a via negativa or via positiva way. I strongly tend based on the experience and events over the last 6 years to the first one. If you look at all the exploits that happened over the last 6 years (including DAO) we haven't got really smarter w.r.t. to reentrancy attacks as it seems (I also blame a little bit the auditors who push for the low-level calls instead e.g. transfer). Too many devs are still not paranoid enough! And Solidity already started implementing such safety features via unchecked for over/underflows. It's definitely a tradeoff here and my reasoning is practice-driven.

pcaversaccio avatar May 17 '22 13:05 pcaversaccio

@cameel @hrkrshnn what are your opinions on my thoughts here?

pcaversaccio avatar Jun 08 '22 07:06 pcaversaccio

I think this shouldn't go into the language with safe/protected keywords and such, and should rather have similar behavior as checked arithmetic.

IMO, reentrancy guards could be added by default for every external call, and bypassed if the call is in an unchecked block. This also follows Rust's unsafe style which seems popular.

leonardoalt avatar Jul 04 '22 09:07 leonardoalt

https://github.com/ethereum/solidity/issues/13124 is a dup of this issue in general but offers two different approaches:

  • Aggressive: basically what I said above
  • Conservative: try to detect whether state changes are performed after external calls, and if yes add the reentrancy guards.

Copying what I said in the other issue:

You could combine both approaches, eg have the aggressive approach by default which can be bypassed by the user if the external call is in an unchecked block, and the compiler can also choose to optimize it away if it detects the conditions in the conservative approach.

leonardoalt avatar Jul 04 '22 09:07 leonardoalt

Standard reachability analysis should be able to determine quite easily whether any given line can modify state or call a function in another contract. Then Solidity just needs to ensure that every function has a partition between state modification first, and external calls second.

If you want the user to have to explicitly signify whether a function updates state, you already have that: non-pure/view. If you want the user to have to explicitly signify which functions can call other contracts, you could add a caller modifier or something. Then you could not call a state modifier function or modify state after a caller function or an address.call within any function -- and any function that calls a caller function or calls address.call also itself becomes a caller function.

lukehutch avatar Jul 04 '22 10:07 lukehutch

I personally really dislike the idea of function modifiers for reentrancy. It's too binary and does not represent the amount of different use cases. Moreover:

  • the strategy you mention first can yield false positives for safe cases that do not conform exactly to this pattern
  • I understand your second paragraph but it sounds really confusing. Imagine documenting/implementing this, it's gonna be full of edge cases and potentially make the analysis more dangerous/confusing.

Edit:

  • I'm not against the reachability analysis. I think it is useful as an optimization for cases that are safe for sure.
  • I just think the simplest would be to add a guard to every call by default with the possibility of removing it. No extra keyword, no complicated analysis besides the optimization above, no edge cases. Plus it conforms to a standard that the language already has.

leonardoalt avatar Jul 04 '22 10:07 leonardoalt

What I proposed is already basically what Slither does (minus the modifiers), but Solidity should give these warnings rather than Slither, because this is such a serious source of security vulnerabilities.

Yes, in the general case precise reachability analysis is uncomputable, unless you use conservative logic that looks to see whether something might be reachable given some specific runtime control flow (then reachability analysis becomes simple and computable).

Re. the modifiers, there's a reason why Solidity made visibility modifiers and mutation modifiers required a few versions ago. Being explicit establishes a contract between the programmer and the compiler. And what I explained in my 2nd paragraph is exactly what Solidity already does (albeit with inverted logic) with mutation modifiers for pure/view functions. So basically the machinery is all already there in the compiler to implement this simply.

lukehutch avatar Jul 04 '22 10:07 lukehutch

It will be much easier to implement robust automatic runtime protections against reentrancy attacks at runtime once EIP-1153 is merged, because it will allow for flags to be set whose lifecycle is bound to a transaction.

However, I believe the vast majority of reentrancy attack vulnerabilities can be detected and prevented through static analysis.

lukehutch avatar Jul 16 '22 20:07 lukehutch

It will be much easier to implement robust automatic runtime protections against reentrancy attacks at runtime once EIP-1153 is merged, because it will allow for flags to be set whose lifecycle is bound to a transaction.

Are you confident that's gonna go in? I personally am not.

However, I believe the vast majority of reentrancy attack vulnerabilities can be detected and prevented through static analysis.

Agree that potential reentrancy can be detected with syntactic static analysis. But what about false positives?

I think there are multiple ideas here that are complementary and provide different levels of safety and user experience. We need to get more people into this discussion, not only from the team. We also need to gather actual data on how many / what type of contracts would benefit from each proposed solution, and how many would be made worse / more annoying.

leonardoalt avatar Jul 17 '22 08:07 leonardoalt

As quickly discussed with @leonardoalt it would be good to compile a list of reentrancy attacks and to understand what kind of code design pattern would fit best. Also, IMHO we should not rely on EIP-1153 to be finalised and implemented soon - let's take the current status quo of the EVM and solc as the valid principles for the solution-finding process. Furthermore, I also think we need to be careful to jump to conclusions too fast. Past code patterns should not necessarily imply what future code patterns will look like (also wrt what is planned for e.g. solc 0.9.0). Maybe to some extent, engineers must "unlearn" certain design patterns for the greater good.

Let me try to share a common, chronological list of reentrancy attacks including their (mostly) detailed analyses:

Edit: I tweeted about it here and will add further incidents if reported by the community.

Update: Due to the overwhelming positive feedback I have received on Twitter, I decided to create my own repository here that tracks all of the reentrancy attacks. I will try my best to keep everything up to date.

[^1]: To prevent the article from constantly reloading, deactivate JavaScript in your browser. [^2]: We list the attacker's address here for the sake of completeness, but technically the attack was executed with a Near-specific transaction type called "Batch Transaction" and not with a specific exploit contract. [^3]: We list the victim contract, the exploit contract, and the exploit transaction on Arbitrum. However, the same exploit was carried out on Optimism with almost the same amount of loss: Victim contract, Exploit contract, Exploit transaction. [^4]: The same exploit hit another victim with almost the same amount of loss: Victim contract. [^5]: The same exploit hit two other victims with almost the same amount of loss: Victim contract 2, Victim contract 3. [^6]: We list the victim contract, the exploit contract, and the exploit transaction on Optimism. However, the same exploit was carried out on Ethereum, albeit with a smaller loss amount: Victim contract, Exploit contract, Exploit transaction. [^7]: We list the victim contract, the exploit contract, and the exploit transaction on Polygon. However, the same exploit was carried out on Ethereum, albeit with a smaller loss amount: Victim contract, Exploit contract, Exploit transaction.

pcaversaccio avatar Jul 18 '22 13:07 pcaversaccio

This is a great idea. Omni NFT was also an reentrancy attack and it would be great to have it here.

Genysys avatar Jul 18 '22 13:07 Genysys

Adding a mutex to functions that write after external calls would not address cases where separate functions read from stale values or overwrite the same storage value.

Take an example from the SAILFISH paper where adding a mutex is insufficient:

contract Test { 
	mapping(uint => uint) splits;
	mapping(uint => uint) deposits;
	mapping(uint => address payable) payee1;
	mapping(uint => address payable) payee2;
	uint lock;
	modifier nonReentrant() {
		lock = 1;
		_;
		lock = 0;
	}
	function updateSplit(uint id, uint split) public{
		require(split <= 100);
		splits[id] = split;
	}
	// [Step 1]: Set split of ’a’ (id = 0) to 100(%)
        // [Step 4]: Set split of ’a’ (id = 0) to 0(%)
	function splitFunds(uint id) public nonReentrant {
		address payable a = payee1[id];
		address payable b = payee2[id];
		uint depo = deposits[id];
		deposits[id] = 0;

                // [Step 2]: Transfer 100% fund to ’a’
                // [Step 3]: Reenter updateSplit
		a.call{value:(depo * splits[id] / 100)}("");
                 
                 // [Step 5]: Transfer 100% fund to ’b’
		b.transfer(depo * (100 - splits[id]) / 100);
 	}
}

Rather than strictly identifying instances that do not follow check-effects-interact, it would make sense to look for function calls that allow for control flow to be controlled externally (as opposed to explicitly indicated like an if-else statement). I think this is clear if one examines the output of slither Test.sol --print slithir-ssa:

Contract Test
        Function Test.updateSplit(uint256,uint256)
                Expression: require(bool)(split <= 100)
                IRs:
                        TMP_0(bool) = split_1 <= 100
                        TMP_1(None) = SOLIDITY_CALL require(bool)(TMP_0)
                Expression: splits[id] = split
                IRs:
                        REF_0(uint256) -> splits_0[id_1]
                        splits_1(mapping(uint256 => uint256)) := ϕ(['splits_0'])
                        REF_0 (->splits_1) := split_1(uint256)
        Function Test.splitFunds(uint256)
                IRs:
                        splits_2(mapping(uint256 => uint256)) := ϕ(['splits_4', 'splits_1', 'splits_0'])
                        deposits_1(mapping(uint256 => uint256)) := ϕ(['deposits_0', 'deposits_3'])
                        payee1_1(mapping(uint256 => address)) := ϕ(['payee1_2', 'payee1_0'])
                        payee2_1(mapping(uint256 => address)) := ϕ(['payee2_0', 'payee2_2'])
                Expression: a = payee1[id]
                IRs:
                        REF_1(address) -> payee1_2[id_1]
                        a_1(address) := REF_1(address)
                Expression: b = payee2[id]
                IRs:
                        REF_2(address) -> payee2_2[id_1]
                        b_1(address) := REF_2(address)
                Expression: depo = deposits[id]
                IRs:
                        REF_3(uint256) -> deposits_2[id_1]
                        depo_1(uint256) := REF_3(uint256)
                Expression: deposits[id] = 0
                IRs:
                        REF_4(uint256) -> deposits_2[id_1]
                        deposits_3(mapping(uint256 => uint256)) := ϕ(['deposits_2'])
                        REF_4 (->deposits_3) := 0(uint256)
                Expression: a.call{value: (depo * splits[id] / 100)}()
                IRs:
                        REF_6(uint256) -> splits_3[id_1]
                        TMP_2(uint256) = depo_1 (c)* REF_6
                        TMP_3(uint256) = TMP_2 (c)/ 100
                        TUPLE_0(bool,bytes) = LOW_LEVEL_CALL, dest:a_1, function:call, arguments:[''] value:TMP_3 
                        splits_4(mapping(uint256 => uint256)) := ϕ(['splits_3', 'splits_1', 'splits_4'])
                Expression: b.transfer(depo * (100 - splits[id]) / 100)
                IRs:
                        REF_8(uint256) -> splits_4[id_1]
                        TMP_4(uint256) = 100 (c)- REF_8
                        TMP_5(uint256) = depo_1 (c)* TMP_4
                        TMP_6(uint256) = TMP_5 (c)/ 100
                        Transfer dest:b_1 value:TMP_6
                Expression: nonReentrant()
                IRs:
                        MODIFIER_CALL, Test.nonReentrant()()

Upon examination, after a.call the mapping, splits, has a phi value, splits_1, that indicates the value can be updated during an external call in updateSplit and used to withdraw more funds than intended. This is essentially a source of "undefined behavior" that results from a developer not being explicit enough about the anticipated control flow. For background, Slither's SSA form creates phi variables after external calls (non STATICCALL) because it is a point where control flow merges.

If we treat reentrancy as an undefined behavior problem, I think we could find a happy medium that is not cumbersome for developers yet eliminates undefined behavior. This would encourage specification and hopefully reduce false positives/ fighting the compiler.

In order to introduce these sort of enhancements, I think it would make sense to consider the points raised in this issue and move towards a modular compiler design. People could opt in to passes that have more advanced analysis than the Solidity compiler and harden their code as they see fit (this is a common practice for other compiled languages). In addition, pruning infeasible paths would likely be slow and only required to run on "release" builds.

P.S. I also like the use of TLOAD for reentrancy protection and think it would give developers less of a reason to circumvent compiler-generated protections, e.g. turning off reentrancy mitigations to reduce SLOADs.

0xalpharush avatar Jul 18 '22 13:07 0xalpharush

Let me try to share a common, chronological list of reentrancy attacks including their (mostly) detailed analyses:

@pcaversaccio an awesome list -- thanks for compiling this!

Adding a mutex to functions that write after external calls would not address cases where separate functions read from stale values or overwrite the same storage value.

@0xalpharush Correct, it's insufficient to prevent state changes if a parent frame in the call stack is a call to an external contract. You have to prevent state changes if a call was made to an external contract at any previous point in the same transaction. This is why I referred to EIP-1153, since it will provide a supported mechanism for binding state to the life of a transaction.

However, perhaps this can be approximated by calculating

bytes32 transactionId = keccak256(abi.encode(block.chainid, block.number));

then modifiers could be created for functions that call external contracts and functions that update internal state as follows:

contract X {
    bytes32 private extContractLastCalled_transactionId;

    modifier extContractCaller() {
        extContractLastCalled_transactionId = keccak256(abi.encode(block.chainid, block.number));
        _;
    }

    modifier updateState() {
        require (keccak256(abi.encode(block.chainid, block.number)) != extContractLastCalled_transactionId,
                    "Can't update state after calling external contract in the same transaction");
        _;
    }
}

I think this would work, but it's not ideal, because

  1. It is coarse-grained (applying at the level of function definitions, rather than per-line). This means the programmer has to be careful to segregate functions into external caller functions and state updater functions. If function modification is done automatically by the compiler using the above pattern, it will have to fail if a function both updates state and calls an external contract.
  2. It fails at runtime, not compiletime (I'm pretty sure 99% of relevant cases could be caught statically, through control flow analysis -- basically anything that is computable, which excludes cases where e.g. you call a data-derived function selector or something).

lukehutch avatar Jul 18 '22 17:07 lukehutch

However, perhaps this can be approximated by calculating

bytes32 transactionId = keccak256(abi.encode(block.chainid, block.number));

Would replacing the mutex modifier with this and having the compiler automatically include it in every call be a reasonable compromise?

Genysys avatar Jul 18 '22 17:07 Genysys

Actually I think transactionId may need to also hash tx.origin, i.e.

bytes32 transactionId = keccak256(abi.encode(block.chainid, block.number, tx.origin));

Although this still doesn't handle the case of a single EOA submitting multiple transactions to be mined in a single block.

Is there any other way to reliably differentiate different transactions?

lukehutch avatar Jul 19 '22 07:07 lukehutch

Just as a quick reference for comparison, Vyper uses a simple @nonreentrant(<key>) decorator (which equals a modifier in Solidity) to prevent reentrancies. I haven't investigated in detail the decorator's behaviour but from the first sight, it would still allow for cross-contract reentrancies as well as for the above example shared by @0xalpharush in case updateSplit would not be protected by such a decorator.

The main point is the following:

Nonreentrancy locks work by setting a specially allocated storage slot to a value on function entrance, and setting it to an value on function exit. On function entrance, if the storage slot is detected to be the value, execution reverts.

pcaversaccio avatar Jul 25 '22 14:07 pcaversaccio

Linking this EIP initiative by @SergioDemianLerner since it can be relevant to this discussion. The proposal is to implement (through a hard fork) a reentrancy guard as a precompile. I don't think we should solve this issue via a hard fork. It's not an EVM problem but a compiler problem. Even if we had such a pre-compile, the discussed issue would still exist by default.

pcaversaccio avatar Jul 29 '22 08:07 pcaversaccio

@pcaversaccio right, a hard fork would solve the problem in the wrong way. Reentrance can and should be statically determined. Slither already does this (although I don't know what degree of complexity it handles).

lukehutch avatar Jul 29 '22 14:07 lukehutch

With ERC777 it can be the other way around see: https://twitter.com/transmissions11/status/1496944873760428058

gpersoon avatar Aug 01 '22 13:08 gpersoon

With ERC777 it can be the other way around see: https://twitter.com/transmissions11/status/1496944873760428058

Good point, and correct, I pointed that out here: https://github.com/OpenZeppelin/openzeppelin-contracts/issues/2620#issuecomment-1156118047

The ERC777 standard is insecure by default, and needs to be deprecated. But presumably there should be a compiler directive in case anyone really wanted to turn off Checks-Effects-Interactions order checking for a given function.

lukehutch avatar Aug 01 '22 19:08 lukehutch

My EIP-5283 represents the simplest hard-fork that provides the mutex functionality that is future-proof for EVM parallel tx execution with fine-grained parallelization. The sooner we introduce such a feature, the higher the number of deployed contracts will be parallelizable. I would also prefer the transient opcodes TLOAD and TSTORE to be used, but adding opcodes is not as easy, since it impacts a lot of tooling.

SergioDemianLerner avatar Aug 02 '22 23:08 SergioDemianLerner

I would like to highlight as part of this discussion a recent reentrancy issue on view functions. Some documentation on this:

  • https://chainsecurity.com/heartbreaks-curve-lp-oracles/
  • https://chainsecurity.com/curve-lp-oracle-manipulation-post-mortem/
  • https://forum.makerdao.com/t/curve-lp-token-oracle-manipulation-vulnerability-emergency-response-postmortem/18011
  • https://forum.makerdao.com/t/curve-lp-token-oracle-manipulation-vulnerability-technical-postmortem/18009
  • https://www.euler.finance/blog/read-only-re-entrancy-guards

A read-only reentrancy reenters view functions which, in contrast to state-altering functions, lack reentrancy guards — enabling the read-only reentrancy. While the reentered contract cannot be affected by its view function, others reading the contract's state can. I'm not yet sure how much of such an attack can be prevented at the compiler level since disallowing reentering view functions by default is not an appropriate solution IMHO. Any opinions?

pcaversaccio avatar Oct 13 '22 18:10 pcaversaccio

A read-only reentrancy reenters view functions which, in contrast to state-altering functions, lack reentrancy guards — enabling the read-only reentrancy. While the reentered contract cannot be affected by its view function, others reading the contract's state can. I'm not yet sure how much of such an attack can be prevented at the compiler level since disallowing reentering view functions by default is not an appropriate solution IMHO. Any opinions?

Core to this problem is the issue that contract storage reads and writes can be interleaved across inter-contract calls. It seems to me that there would be two ways to prevent this issue, both of which would require EVM changes:

(1) Deny read access to the storage of a contract after the first attempt to write to the contract's storage, until the end of the transaction. This could be switched on with a pragma, for extra security, but it would probably feel restrictive.

(2) Add opcodes to the EVM to read the consensus-committed values from storage (i.e. use copy-on-write to keep a copy of the "old" values and the "new" values of the storage of a contract, and have opcodes that only read "old" values. Then, when a transaction is committed, the "new" values are treated as the "old" values for the next transaction). This would require changes to EVM-targeting languages, but it would be possible to treat view functions as quasi-pure if their return value depends only upon old values, as it cannot be affected by any write operation, at least until the transaction is committed.

lukehutch avatar Oct 14 '22 00:10 lukehutch

I don't think iron finance was ever exploited by reentrance attack. The mentioned tx showed normal interaction. Moreover, neither IRON and TITAN token can be used the reentrance attack, even though the pool contract could be reentraned given certian type of pool token.

tangpin213 avatar Nov 27 '22 14:11 tangpin213

@tangpin213 the Iron Finance attack is a little special, you can read about it here. It's debatable whether to list it as a pure reentrancy attack or not since it requires a malicious LP contract - I decided to list it due to the sake of completeness.

TL; DR: The MasterChef contracts all have an emergencyWithdraw function that can be used for a cross-function reentrancy attack. If in any case a malicious contract is added to the pool it will be able to steal LP tokens from the pool.

Also, I would like to highlight that even though a token contract is not subject to reentrancy, a reentrancy attack can still be executed, e.g. across multiple systems (cross-contract reentrancy attack).

pcaversaccio avatar Nov 27 '22 16:11 pcaversaccio

@

pcaversaccio

Thank you for reply. I agree with your thoughts!

tangpin213 avatar Nov 27 '22 16:11 tangpin213

Thank you for doing this post

SaveSnowflake avatar Mar 03 '23 08:03 SaveSnowflake

Since EIP-1153 is confirmed to be included in the upcoming Cancun-Deneb upgrade, we now have transient opcodes at hand that can 'easily' be used to disable reentrancy by default. I really think this is very important and should be included in the 0.9.0 solc roadmap. For the sake of transparency, I cross-reference the current Vyper discussion here as well: https://github.com/vyperlang/vyper/issues/3380.

pcaversaccio avatar May 01 '23 19:05 pcaversaccio