perf: branchless square root
Branchless Square Root
Optimized sqrt implementation in Yul that avoids implicit and explicit branches, which are relatively expensive on the EVM. It eliminates Solidity's division checks, reuses and improves msb, and expresses conditionals arithmetically. The result is 49% lower gas cost on average and a constant 407 gas per call.
Estimated Gas Cost
| Old | New | %diff | |
|---|---|---|---|
| Maximum | 807 | 407 | -49.6% |
Average over uint256 |
798.1 | 407 | -49.0% |
| Standard Deviation | 9.8 | 0 | -100% |
| Minimum ($x > 0$) | 638 | 407 | -36.2% |
| $x = 0$ | 74 | 407 | +450% |
Code Transformations
Branchless log2 (-40 gas on average)
The prior code branched once per region in uint256, costing ~14 gas (PUSH/JUMPI/JUMPDEST) per branch plus 24-30 gas when taken. About half of those branches fire on average, so switching to msb() already saves gas, even though msb() does one extra iteration (msb0), which sqrt discards.
Slight regression at this step: inputs that previously short-circuited (including values around 1e18) go from ~700 to ~754 gas because early-outs are gone. Acceptable, because this change enables the msb() optimization below, which cuts 60 gas and is required to reach the final 407 gas.
Reorder instructions on msb (-75 gas)
msb() previously updted x as it progressed. On a stack machine (EVM) this forces extra SWAP/DUP traffic. We can compute x >> result every iteration using the same number of shr(). Saves 75 gas.
Branchless "perfect square" condition (-42 gas on average)
The final iterate is always either $\lfloor\sqrt{x}\rfloor$ or $\lfloor\sqrt{x}\rfloor + 1$. So we replace result = min(result, x / result) with a boolean subtract: result -= (result > x / result) ? 1 : 0. Yul implementation drops gas from 679-687 to 641.
Unchecked division (-196 gas)
Solidity still inserts a division-by-zero check, even in unchecked blocks (Checked or Unchecked Arithmetic). The only way to avoid it is to use assembly directly, which yields the biggest win here: -196 gas.
Skip condition for zero (-38 gas on average)
With division in Yul, we drop the x == 0 branch and rely on the EVM semantics div(a, 0) = 0 (EVM Codes - DIV). This saves 38 gas for every nonzero input; $x = 0$ regresses by +333. This is intentional: inputs are expected to be non-trivial most of the time ($\geq 90\%$), so the average gas cost improves.
Optimized inlined msb (-21 gas, not implemented)
Inlining msb() into sqrt and skipping msb0 saves 21 gas, but the readability hit isn't worth it IMO, so I left it out. I can bring it back if you like it.