solidity Pattern PUSH0 SHR appears in optimized code

Description

Optimizer does not optimize out PUSH0 SHR instructions ( x >> 0 is always x)

Environment

Compiler version: solc v0.8.23
Target EVM version (as per compiler settings): default
Framework/IDE (e.g. Truffle or Remix): none
EVM execution environment / backend / blockchain client: none
Operating system: Linux
Compiled with command: solc/solc0.8.23 -o build/examples/push0shr.sol/ --overwrite --ast-compact-json --asm --bin-runtime --bin --optimize --optimize-runs 200000 --abi --combined-json=srcmap-runtime,generated-sources-runtime examples/push0shr.so

Steps to Reproduce

compile the following example with the above command:

// SPDX-License-Identifier: MIT

pragma solidity ^0.8.20;


contract Push0Shr
{   
    uint256 rand_counter = 0;
    function random() public returns(uint256)
    {
        rand_counter += 1;
        return uint256(keccak256(abi.encodePacked(block.prevrandao, block.timestamp, rand_counter, msg.sender))); 
    }
}

The pattern appears to be generated by the source location s:l:f = (207, 97, 0)

Apr 12 '24 16:04 veniger

@ekpyron This is what I mentioned in our chat, it should be reproducible from this

Apr 12 '24 16:04 veniger

note: there could be considerations that this optimization could make an execution diff if the stack had no items on it prior to the PUSH0 instruction, where without the optimization the EVM would produce an error but with the optimization it would continue. I'm not sure this is important to consider, since we shouldn't get to the point where the compiler generates SHR with only one value on stack

Apr 12 '24 16:04 veniger

This seems to be an issue only in the legacy pipeline. With --via-ir the shift by zero gets optimized out by Yul optimizer.

IR pipeline

IR

solc test.sol --debug-info none --optimize --optimize-runs 200000 --ir | grep shr

                shr(224, value)
                shr(0, value)

Optimized IR

solc test.sol --debug-info none --optimize --optimize-runs 200000 --ir-optimized | grep shr

                    if eq(0x5ec01e4d, shr(224, calldataload(0)))

Assembly

solc test.sol --debug-info none --optimize --optimize-runs 200000 --asm --via-ir | grep shr --context=1

    tag_1:
      jumpi(tag_3, eq(0x5ec01e4d, shr(0xe0, calldataload(0x00))))
      0x00

legacy

Assembly

solc test.sol --debug-info none --optimize --optimize-runs 200000 --asm | grep shr --context=1

      jumpi(tag_2, lt(calldatasize, 0x04))
      shr(0xe0, calldataload(0x00))
      dup1
--
      0x00
      shr
      swap1

Apr 12 '24 16:04 cameel

Legacy and via-IR share the same set of rules here, though... I'd have guessed that we generate a PUSH0 as an AssemblyItem instruction somewhere instead of a push with value zero and that's what causes the rules not to kick in, but may also be something else - it's strange in any case and worth a look into the cause.

Apr 15 '24 12:04 ekpyron

I looked into this and found out that the problem also happens with evm versions prior to shanghai (which introduced PUSH0). Compiling with, for example, paris:

solc test.sol --evm-version paris --debug-info none --optimize --optimize-runs 200000 --asm | grep shr --context=1

also generates the pattern:

--
      jumpi(tag_2, lt(calldatasize, 0x04))
      shr(0xe0, calldataload(0x00))
      dup1
--
      0x00
      shr
      swap1

I then investigated the Common Subexpression Eliminator (CSE) and found out that the optimizer can successfully identify and replace the pattern <ANY> 0 SHR, but cannot do so in the presence of PUSH0 instruction. I confirmed this by adding two test cases to test/libevmasm/Optimiser.cpp:

BOOST_AUTO_TEST_CASE(cse_replace_shr_push_value0_any)
{
	AssemblyItems input {
		AssemblyItem(20),
		Instruction::KECCAK256,
		AssemblyItem(0),
		Instruction::SHR,
		Instruction::JUMP
	};

	AssemblyItems output {
		AssemblyItem(20),
		Instruction::KECCAK256,
		Instruction::JUMP
	};

	checkCSE(input, output);
	checkFullCSE(input, output);
}

BOOST_AUTO_TEST_CASE(cse_replace_shr_push0_instruction_any)
{
	AssemblyItems input {
		AssemblyItem(20),
		Instruction::KECCAK256,
		Instruction::PUSH0,
		Instruction::SHR,
		Instruction::JUMP
	};

	AssemblyItems output {
		AssemblyItem(20),
		Instruction::KECCAK256,
		Instruction::PUSH0,
		Instruction::SHR,
		Instruction::JUMP
	};

	checkCSE(input, output);
	checkFullCSE(input, output);
}

The problem occurs in the specific example provided in this issue, because when the CSE optimizes the block (between two control flow breaking instructions) in which the SHR instruction is, the size of the block is equal to the original one. The CSE only applies the optimizations if the new size is less than the original one.

original asm items (size 24)

PUSH 40 MLOAD PUSH 20 DUP2 DUP4 SUB SUB DUP2 MSTORE SWAP1 PUSH 40 MSTORE DUP1 MLOAD SWAP1 PUSH 20 ADD KECCAK256 PUSH 0 SHR SWAP1 POP SWAP1 JUMP [out]

optimized asm items (size 24)

PUSH 40 DUP1 MLOAD PUSH ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffe0 DUP2 DUP5 SUB ADD DUP2 MSTORE SWAP2 SWAP1 MSTORE DUP1 MLOAD PUSH 20 SWAP1 SWAP2 ADD KECCAK256 SWAP2 SWAP1 POP JUMP [out]

May 26 '24 18:05 matheusaaguiar