solidity do-while results in redundant branching instructions when compiling via IR

Description

When using a do-while control flow statement such as in the following code:

contract Sum {
    function sum(uint256[] calldata nums) public pure returns (uint256 total) {
        if (nums.length == 0) return 0;
        uint256 i = 0;
        do {
            assembly ("memory-safe") {
                total := add(calldataload(add(nums.offset, shl(5, i))), total)
                i := add(i, 1)
            }
        } while (i < nums.length);
    }
}

I expect solidity to generate the relatively straight-forward & maximally efficient assembly:

tag_doWhileBodyEntry:
    <doWhileBody>
    <whileCondition>
    tag_doWhileBodyEntry
    jumpi

This is the case when compiling with the legacy pipeline but using the IR pipeline gives you roughly the following structure:

    0x01
tag_doWhileEntry:
    iszero
    tag_whileCondition
    jumpi
tag_doWhileBody:
    <doWhileBody>
    0x00
    tag_doWhileEntry
    jump
tag_whileCondition:
    <whileCondition>
    tag_doWhileBody
    jumpi

This structure, while correct is not efficient. Digging deeper the origin of this structure becomes more apparent when looking at the generated IR:

function fun_sum(var_nums_offset, var_nums_length) -> var_total {
    var_total := 0
    if iszero(var_nums_length) {
        var_total := 0
        leave
    }
    let var_i := 0
    let _1 := 1
    for { } 1 { } {
        if iszero(_1) {
            if iszero(lt(var_i, var_nums_length)) { break }
        }
        _1 :=  0
        var_total := add(calldataload(add(var_nums_offset, shl(5, var_i))), var_total)
        var_i := add(var_i, 1)
    }
}

Environment

Compiler version: 0.8.27
Target EVM version (as per compiler settings): cancun
Framework/IDE (e.g. Truffle or Remix): foundry
EVM execution environment / backend / blockchain client: viaIR
Operating system: macOS

Steps to Reproduce

Compile the following contract with viaIR

contract Sum {
    function sum(uint256[] calldata nums) public pure returns (uint256 total) {
        if (nums.length == 0) return 0;
        uint256 i = 0;
        do {
            assembly ("memory-safe") {
                total := add(calldataload(add(nums.offset, shl(5, i))), total)
                i := add(i, 1)
            }
        } while (i < nums.length);
    }
}

Sep 19 '24 10:09 Philogy

Yes, I can confirm that this is how it works, but unfortunately I don't think we can do much about it, at least in the short term. This a case where Yul not having jumps makes it impossible to implement it more efficiently in a general case.

Since Yul has no native do...while loop, we simulate it with a for loop. Naively one could do it like this:

for {} true {} {
    <body>
    if iszero(<condition>) { break }
}

This should give you the assembly you want, however, it will break if you put continue in the body. The codegen translates it to Yul's continue, which will not jump to the right condition here. The legacy codegen does not have this problem, since it can use arbitrary jumps and just inserts a tag before the right condition, making continue jump there.

To deal with this problem, we generate this instead, which is the reason for the less efficient assembly you get via IR:

let firstRun := 1
for {} true {} {
    if iszero(firstRun) {
        if iszero(<condition>) { break }
    }
    firstRun := 0
    <body>
}

Since this affects only code that contains continue, one option would be to detect its presence and generate the simpler code when it's not there. But I'm not convinced that the extra special-casing is worth it as it goes hard against the general philosophy of keeping the IR codegen simple. It is technically an option, as we did it allow one such case in the past (Optimizer > Unchecked Loop Increment), but this one does not seem to be on the same level in terms of how common it is and what impact it has.

We'd rather do things like this in the optimizer, and we may eventually do it, but we have many potential optimizations in the pipeline and this one does not seem like it would be high on the priority list.

Sep 20 '24 10:09 cameel

but unfortunately I don't think we can do much about it

Why couldn't Yul be expanded/improved?

If the goal of Yul is to be simple to compile then forcing the addition of such specialized passes in the backend seems like it'd be going against that motto? Maybe wrong venue but I'd love to learn more about how the solc team thinks about language & compiler design.

Sep 27 '24 02:09 Philogy

It could be extended, but at this point we're not too keen on doing that unless there is no other way to address the issue. We want Yul to be a stable target for intermediate compilation. In this particular case the benefits are not that clear-cut. It would help only this single case, and at a relatively high cost of adding a completely new construct that has to be accounted for by every tool that processes the language. Yul is very simple and we want it to stay that way. It's actually already sub-optimal that we ended up with for as the only looping construct rather than much simpler while.

Addressing it in the optimizer is the much preferred option. After talking it over internally, it looks like it might be doable even in the general case - if done as control flow graph simplification rather than as a transformation of Yul code. I added it to our list of Assorted Optimizer Ideas.

Having said that, there's actually a different potential Yul feature we've been considering recently, which might allow us to fix this issue as well: multi-level break statement. This came up in our language research group in the context of translating from some other language to Yul. Something like this has many more useful applications so it might be actually worth it. But we'll see. For now it's just a fresh idea.

Oct 16 '24 19:10 cameel