Pattern PUSH0 SHR appears in optimized code
Description
Optimizer does not optimize out PUSH0 SHR instructions ( x >> 0 is always x)
Environment
- Compiler version: solc v0.8.23
- Target EVM version (as per compiler settings): default
- Framework/IDE (e.g. Truffle or Remix): none
- EVM execution environment / backend / blockchain client: none
- Operating system: Linux
- Compiled with command:
solc/solc0.8.23 -o build/examples/push0shr.sol/ --overwrite --ast-compact-json --asm --bin-runtime --bin --optimize --optimize-runs 200000 --abi --combined-json=srcmap-runtime,generated-sources-runtime examples/push0shr.so
Steps to Reproduce
compile the following example with the above command:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.20;
contract Push0Shr
{
uint256 rand_counter = 0;
function random() public returns(uint256)
{
rand_counter += 1;
return uint256(keccak256(abi.encodePacked(block.prevrandao, block.timestamp, rand_counter, msg.sender)));
}
}
The pattern appears to be generated by the source location s:l:f = (207, 97, 0)
@ekpyron This is what I mentioned in our chat, it should be reproducible from this
note: there could be considerations that this optimization could make an execution diff if the stack had no items on it prior to the PUSH0 instruction, where without the optimization the EVM would produce an error but with the optimization it would continue. I'm not sure this is important to consider, since we shouldn't get to the point where the compiler generates SHR with only one value on stack
This seems to be an issue only in the legacy pipeline. With --via-ir the shift by zero gets optimized out by Yul optimizer.
IR pipeline
IR
solc test.sol --debug-info none --optimize --optimize-runs 200000 --ir | grep shr
shr(224, value)
shr(0, value)
Optimized IR
solc test.sol --debug-info none --optimize --optimize-runs 200000 --ir-optimized | grep shr
if eq(0x5ec01e4d, shr(224, calldataload(0)))
Assembly
solc test.sol --debug-info none --optimize --optimize-runs 200000 --asm --via-ir | grep shr --context=1
tag_1:
jumpi(tag_3, eq(0x5ec01e4d, shr(0xe0, calldataload(0x00))))
0x00
legacy
Assembly
solc test.sol --debug-info none --optimize --optimize-runs 200000 --asm | grep shr --context=1
jumpi(tag_2, lt(calldatasize, 0x04))
shr(0xe0, calldataload(0x00))
dup1
--
0x00
shr
swap1
Legacy and via-IR share the same set of rules here, though... I'd have guessed that we generate a PUSH0 as an AssemblyItem instruction somewhere instead of a push with value zero and that's what causes the rules not to kick in, but may also be something else - it's strange in any case and worth a look into the cause.
I looked into this and found out that the problem also happens with evm versions prior to shanghai (which introduced PUSH0).
Compiling with, for example, paris:
solc test.sol --evm-version paris --debug-info none --optimize --optimize-runs 200000 --asm | grep shr --context=1
also generates the pattern:
--
jumpi(tag_2, lt(calldatasize, 0x04))
shr(0xe0, calldataload(0x00))
dup1
--
0x00
shr
swap1
I then investigated the Common Subexpression Eliminator (CSE) and found out that the optimizer can successfully identify and replace the pattern <ANY> 0 SHR, but cannot do so in the presence of PUSH0 instruction. I confirmed this by adding two test cases to test/libevmasm/Optimiser.cpp:
BOOST_AUTO_TEST_CASE(cse_replace_shr_push_value0_any)
{
AssemblyItems input {
AssemblyItem(20),
Instruction::KECCAK256,
AssemblyItem(0),
Instruction::SHR,
Instruction::JUMP
};
AssemblyItems output {
AssemblyItem(20),
Instruction::KECCAK256,
Instruction::JUMP
};
checkCSE(input, output);
checkFullCSE(input, output);
}
BOOST_AUTO_TEST_CASE(cse_replace_shr_push0_instruction_any)
{
AssemblyItems input {
AssemblyItem(20),
Instruction::KECCAK256,
Instruction::PUSH0,
Instruction::SHR,
Instruction::JUMP
};
AssemblyItems output {
AssemblyItem(20),
Instruction::KECCAK256,
Instruction::PUSH0,
Instruction::SHR,
Instruction::JUMP
};
checkCSE(input, output);
checkFullCSE(input, output);
}
The problem occurs in the specific example provided in this issue, because when the CSE optimizes the block (between two control flow breaking instructions) in which the SHR instruction is, the size of the block is equal to the original one. The CSE only applies the optimizations if the new size is less than the original one.
original asm items (size 24)
PUSH 40 MLOAD PUSH 20 DUP2 DUP4 SUB SUB DUP2 MSTORE SWAP1 PUSH 40 MSTORE DUP1 MLOAD SWAP1 PUSH 20 ADD KECCAK256 PUSH 0 SHR SWAP1 POP SWAP1 JUMP [out]
optimized asm items (size 24)
PUSH 40 DUP1 MLOAD PUSH ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffe0 DUP2 DUP5 SUB ADD DUP2 MSTORE SWAP2 SWAP1 MSTORE DUP1 MLOAD PUSH 20 SWAP1 SWAP2 ADD KECCAK256 SWAP2 SWAP1 POP JUMP [out]