Wallet attribute: Transaction simulation
One of the wallet attributes we want to track is the ability for a wallet to faithfully simulate the effect of transactions, as a user safety feature.
We should come up with a set of reasonable "benchmark" transactions and test that wallet correctly simulate their effects, saved as wallet feature data:
- Sending Ether
- Sending ERC-20 tokens or NFTs
- Signing a Safe wallet transaction
- Some DeFi transactions like depositing into Aave or taking a loan from it
- Others?
Then this can be turned into an attribute.
State transition & deterministic simulation
In the context of large phishing campaigns and drainers-as-a-service, a user’s intent can (and should) be defined by the simulated outcome of a transaction: if a deterministic simulation yields the outcome the user expects, that outcome becomes the ground truth of the user’s intent and a safe basis for signing decisions.
A state transition is the deterministic update of Ethereum’s world state when a transaction (or block) is applied , given the same prior state and identical inputs, the EVM must produce a single, exact resulting state. Therefore a simulation is only trustworthy when it replicates every environment input the EVM or contract can observe; any mismatch (even subtle ones) can change the execution trace, gas used, return data, or whether the call succeeds or reverts, breaking reproducibility and undermining intent validation. Among essential environment items to replicate, there are block header fields which are:
block.number
block.timestamp
baseFee
lastblockhash
gasLimit : the gas limit discrepancy between wallet simulations and real transactions
gasprice
How Exploit leverages environment replication discrepancy between wallet simulations and real transactions to deceive users and drain wallets :
malicious contract have a receive() or fallback() function contains logic that implements dual execution paths: if (msg.gas <= _receive) { // LOW GAS PATH - Real transactions 0x82a(msg.value); // Steals ETH exit; } else { // HIGH GAS PATH - Simulations only msg.sender.call().value(msg.value); // Shows fake refund } In Simulation (HIGH GAS PATH): Gas Used in simulation: 0xf0cb98 = 15,780,760 gas Threshold Comparison: 15,780,760 > 15,000,000 Result: Fake refund displayed to user in wallet simulation User Perception: "Contract is safe"
How to track the ability for a wallet to faithfully simulate the effect of transactions :
To track the determinism of a wallet provider’s simulation, we propose deploying a contract that encodes conditional logic based on observable environment variables; the contract emits (or conditionally transfers) depending on the value of those variables so you can detect whether the simulator and on-chain execution followed the same execution path.
Thanks @jadjad1289DDFD, this is helpful to know.
My main question is: how can this be measured or verified for a given wallet where we (Walletbeat) may not necessarily have access to the source code that the transaction simulation engine uses? I think this is what your last paragraph addresses, so let me try to reword it to make sure I understand it:
We should create a contract that behaves like scam contracts do, in the sense that it should change its behavior according to environment data which can change between simulation-time and onchain-execution time; namely: block number/timestamp, fee pricing, and transaction gas parameters. This can be a one-off thing (write the contract in solidity, deploy to mainnet).
Then, for each wallet we want to rate on transaction simulation quality, we should try to ask the wallet to simulate a transaction against this contract. The wallet's transaction simulation feature should correctly identify that the contract behaves differently depending on these parameters, and flag this issue as a likely-scam contract, thereby producing a warning on the wallet UI.
Do I have this right?
If so, a few follow-up questions:
- Is this method sufficient such that it avoids false positives? I am wondering if some non-scam contracts depend on these environment variables for legitimate reasons, that would be expected to be flagged as fraudulent under this methodology.
- Does this method adequately capture all forms of ways in which transaction simulation quality can be measured? Or are there other ways that a transaction simulation feature can still be fooled?
- Can we distinguish between a transaction engine flagging this Walletbeat test simply based on the contract address ("this is the contract Walletbeat uses for verification, so we'll just blacklist all transactions that use it regardless of what the contract does"), vs based on the actual contract behavior? (If not, that's OK and still a better test than nothing, I'm just wondering out loud here).
This requires clarification: We can directly access the parameters used by the wallets during the simulation via API endpoint or indirectly.
'Is this method sufficient such that it avoids false positives? I am wondering if some non-scam contracts depend on these environment variables for legitimate reasons, that would be expected to be flagged as fraudulent under this methodology.'' here we are evaluating whether the wallet’s transaction simulation is deterministic, and we give it a score. We test it using different environment-parameter values. The purpose is not to flag malicious contracts, but rather to quantify whether the wallet’s simulation reproduces on-chain state transitions and accurately reflects the user’s intent. "Does this method adequately capture all forms of ways in which transaction simulation quality can be measured? Or are there other ways that a transaction simulation feature can still be fooled?" This method allows us to evaluate the determinism of the simulation,aka its ability to reflect the user's intent. The objective is to collect and feed the framework with all the parameters and variables that guarantee determinism. At this stage we have the block-header parameters . Regarding detection of malicious smart contracts and phishing websites, I will propose a detailed evaluation methodology.
We can directly access the parameters used by the wallets during the simulation via API endpoint or indirectly.
I'm not sure that's universally true. For example, I'd imagine that it is feasible to design a simulation endpoint that does not take a timestamp (or other such parameters), and which internally (server-side) tries to perform multiple simulations that use random permutations of such parameters (e.g. simulate with multiple randomly-selected timestamps). That would allow it to detect non-determinism without the client needing to explicitly specify a non-canonical timestamp, right?
here we are evaluating whether the wallet’s transaction simulation is deterministic, and we give it a score. We test it using different environment-parameter values. The purpose is not to flag malicious contracts, but rather to quantify whether the wallet’s simulation reproduces on-chain state transitions and accurately reflects the user’s intent. This method allows us to evaluate the determinism of the simulation,aka its ability to reflect the user's intent. The objective is to collect and feed the framework with all the parameters and variables that guarantee determinism. At this stage we have the block-header parameters .
Right, but I think this is conflating transaction determinism with transaction simulation fidelity. For the purpose of this attribute, we don't just want to verify that a wallet can produce a consistent simulation result; we want to verify that this result is useful at avoiding scam contracts. So simulation consistency is part of the requirement, but not sufficient by itself.
My understanding of the explanation above is that a transaction is an execution of the chain's transition function, and it must be deterministic. It is influenced by certain environment parameters (block timestamp etc) which are known at transaction inclusion time (making inclusion-time execution deterministic), but not exactly known at simulation time.
Therefore, in order to be useful at preventing scams, a wallet's transaction simulation feature must simultaneously:
- Be able to deterministically reproduce the effect of a user's transaction given current chain state, knowing that this may not necessarily be exactly the same at transaction inclusion time, AND
- Be able to verify whether the simulation-time result it is showing to the user is meaningful for the purpose of avoiding scams, i.e. that the transaction's effect is likely to be the same using current chain state as it would be at transaction inclusion time. One possible implementation of this would be to have it internally simulate the transaction with different permutations of environment parameters to observe whether the outcome changes.
- If the outcome does change in these alternate simulations, then the simulation feature knows that its initial simulated outcome is not reflective of the transaction's inclusion-time outcome, and thus it may be dealing with a scam contract that is attempting to avoid simulation detection by changing its behavior based on these parameters.
Do I have this right?
I'm providing some context here so that the determinism attribute of transaction simulation is well-defined and to express how it differs from simulation fidelity.
In general, Transaction Simulation solutions are exposed to three challenges:
- Fake events from malicious contracts
- The simulation environment parameters are different than the actual blockchain environment
- State manipulation between simulation and execution
Let's take the 3rd challenge: if a contract changes state between the time of simulation and execution, in this case the simulation is deterministic and faithful, but useless for the user.
Compare with the 2nd challenge: if a malicious contract is simulated in a wallet engine that uses at least one variable different than the actual blockchain environment, here we say that the transaction simulation is non-deterministic. This is the simulation issue we want to attack at this stage.
So yes, avoiding scam contracts requires simulation consistency as part of the requirement, but not sufficient by itself. However, it is necessary.
We can add the other two categories later. Also, I want to clarify that the detection of malicious smart contracts is a separate attribute entirely, which we will address later.