solidity
solidity copied to clipboard
Reverse lookup for DataFlowAnalyzer
Part of https://github.com/ethereum/solidity/issues/13719 and https://github.com/ethereum/solidity/issues/13822.
Benchmarking is run on the contract.
Mostly helps with CSE (CommonSubexpressionEleminator
) and LiteralRematerialiser
, LoadResolver
and ExpressionSimplifier
, i.e. steps that inherit from and thus rely on the DataFlowAnalyzer
. The results are as follows:
This PR is essentially a rework of @chriseth's Improve DataFlowAnalyzer PR, which includes many other improvements (of which I think all were merged already in separate PRs).
The question here is whether we'd prefer the approach here (where the in-order and reverse lookup are wrapped in a separate class), or in Chris' PR, where the reverse lookup is just added to the DataFlowAnalyzer::State
object and then used directly from there.
develop (baseline)
======================================
0.000% (8e-06 s): FunctionGrouper
0.001% (1.4e-05 s): VarDeclInitializer
0.001% (2.3e-05 s): ForLoopInitRewriter
0.003% (8.1e-05 s): FunctionHoister
0.017% (0.000421 s): ExpressionInliner
0.035% (0.00085 s): UnusedFunctionParameterPruner
0.075% (0.001829 s): ForLoopConditionIntoBody
0.077% (0.001873 s): ForLoopConditionOutOfBody
0.082% (0.002001 s): SSAReverser
0.093% (0.002263 s): StructuralSimplifier
0.135% (0.003282 s): BlockFlattener
0.148% (0.003606 s): CircularReferencesPruner
0.223% (0.00542 s): ExpressionJoiner
0.235% (0.005713 s): EquivalentFunctionCombiner
0.313% (0.007626 s): Rematerialiser
0.335% (0.008165 s): LoopInvariantCodeMotion
0.561% (0.013661 s): FunctionSpecializer
0.845% (0.020568 s): ExpressionSplitter
0.902% (0.021972 s): ConditionalUnsimplifier
0.961% (0.023401 s): ConditionalSimplifier
1.059% (0.025783 s): ControlFlowSimplifier
1.072% (0.026108 s): DeadCodeEliminator
1.551% (0.037774 s): EqualStoreEliminator
3.646% (0.088806 s): FullInliner
3.853% (0.09384 s): UnusedStoreEliminator
4.819% (0.117355 s): UnusedPruner
7.358% (0.179209 s): SSATransform
7.365% (0.179365 s): UnusedAssignEliminator
9.883% (0.240684 s): ExpressionSimplifier
11.218% (0.273217 s): LoadResolver
16.676% (0.40613 s): LiteralRematerialiser
26.459% (0.644396 s): CommonSubexpressionEliminator
--------------------------------------
100% (2.435 s)
with reverse lookup (this)
Performance metrics of optimizer steps
======================================
0.001% (1e-05 s): FunctionGrouper
0.001% (2.1e-05 s): VarDeclInitializer
0.002% (2.9e-05 s): ForLoopInitRewriter
0.006% (9.3e-05 s): FunctionHoister
0.026% (0.00041 s): ExpressionInliner
0.053% (0.000846 s): UnusedFunctionParameterPruner
0.113% (0.001798 s): ForLoopConditionIntoBody
0.125% (0.001986 s): ForLoopConditionOutOfBody
0.134% (0.002124 s): SSAReverser
0.146% (0.002312 s): StructuralSimplifier
0.183% (0.002902 s): BlockFlattener
0.228% (0.003604 s): CircularReferencesPruner
0.332% (0.005263 s): ExpressionJoiner
0.336% (0.005319 s): Rematerialiser
0.367% (0.005814 s): EquivalentFunctionCombiner
0.514% (0.008142 s): LoopInvariantCodeMotion
0.963% (0.015248 s): FunctionSpecializer
1.211% (0.019182 s): EqualStoreEliminator
1.309% (0.020744 s): ExpressionSplitter
1.360% (0.021539 s): ConditionalUnsimplifier
1.429% (0.022642 s): ConditionalSimplifier
1.634% (0.02588 s): ControlFlowSimplifier
1.656% (0.026226 s): DeadCodeEliminator
5.166% (0.081833 s): FullInliner
5.865% (0.092911 s): UnusedStoreEliminator
7.376% (0.116846 s): UnusedPruner
8.498% (0.134622 s): LoadResolver
8.557% (0.135551 s): ExpressionSimplifier
9.066% (0.143613 s): LiteralRematerialiser
11.084% (0.175588 s): SSATransform
11.256% (0.178307 s): UnusedAssignEliminator
21.004% (0.332742 s): CommonSubexpressionEliminator
--------------------------------------
100% (1.584 s)
Also, @chriseth, do you have a suggestion for the changelog entry?
Looks good! Please squash.
Looks good! Please squash.
Done!
I checked how this PR affects compilation times in external tests.
Here are the times for ir-optimize-evm+yul
preset from
run 29920 on develop
(8c7404f639b19a30d0e0ace31002bb48a7a6f5c4)
vs
run 29928 on dataflow-analyzer-reverse-lookup
(a8cc9bde9b64d44e19b59ada18412675654e5d22).
test | develop |
this PR | difference |
---|---|---|---|
zeppelin | 6m48.256s | 5m46.622s | -1:02 |
colony | 1m51.012s | 1m27.582s | -0:24 |
uniswap | 1m46.737s | 1m40.339s | -0:06 |
prb-math | 0m13.031s | 0m10.887s | -0:03 |
gp2 | 0m44.495s | 0m43.621s | -0:01 |
brink | 0m04.842s | 0m05.302s | +0:01 |
ens | 0m45.856s | 0m48.436s | +0:03 |
perpetual-pools | 1m28.025s | 1m32.749s | +0:04 |
yield-liquidator | 0m31.450s | 0m38.634s | +0:07 |
elementfi | 2m48.506s | 3m15.299s | +0:27 |
I wonder what's happening with ElementFi. I double checked I got the numbers from the right pages and looks like it really took ~30 s longer. May be worth checking if it's repeatable or just a fluke. Still, some other times increased too so there might be something more to it.
The table contains only the real
time. Full results are below in case anyone's interested.
zeppelin
real 6m48.256s user 6m47.482s sys 0m2.189s
real 5m46.622s user 5m46.191s sys 0m1.551s
uniswap
real 1m46.737s user 1m48.668s sys 0m0.732s
real 1m40.339s user 1m41.697s sys 0m0.673s
ens
real 0m45.856s user 0m45.927s sys 0m0.656s
real 0m48.436s user 0m48.450s sys 0m0.580s
gp2
real 0m44.495s user 0m52.945s sys 0m1.442s
real 0m43.621s user 0m51.973s sys 0m1.369s
brink
real 0m4.842s user 0m4.763s sys 0m0.603
real 0m5.302s user 0m5.272s sys 0m0.556s
elementfi
real 2m48.506s user 2m56.110s sys 0m2.225s
real 3m15.299s user 3m24.894s sys 0m2.584s
perpetual-pools
real 1m28.025s user 1m30.875s sys 0m1.107s
real 1m32.749s user 1m36.234s sys 0m1.284s
prb-math
real 0m13.031s user 0m13.804s sys 0m0.469s
real 0m10.887s user 0m11.300s sys 0m0.444s
yield-liquidator
real 0m31.450s user 0m34.304s sys 0m0.567s
real 0m38.634s user 0m41.636s sys 0m0.534s
colony
real 1m51.012s user 2m17.810s sys 0m3.484s
real 1m27.582s user 1m50.964s sys 0m2.876s
This pull request is stale because it has been open for 14 days with no activity.
It will be closed in 7 days unless the stale
label is removed.
I gathered fresh timing data (and translated the original data into the same format).
- It seems to me that all the timing differences we see in external tests, even the positive ones may be just noise:
- The differences are much smaller now. Especially for
zeppelin
,elementfi
andcolony
, which originally had the largest difference, the difference not only decreased but completely reversed direction. - In some cases the differences flip between positive and negative in different runs for the same project.
-
zeppelin
anduniswap
compile much faster ondevelop
now. In case ofzeppelin
this could very well be due to upstream changes in tests. Foruniswap
though we have a forked repo.
- The differences are much smaller now. Especially for
- On the other hand, there is actually a 2x difference in our own timing benchmark - though only for a single contract (
chains.sol
). Others, e.g.OptimizorClub.sol
, do not seem affected.
My conclusion here is that there's no strong evidence that this change significantly affects external tests, be it positively or negatively. If there is a difference, it's lower than the normal variance of CI timing. Still, the change does seem to help one especially pathological contract from our repo (chains.sol
). I wonder if it might be because profiling was done specifically on that contract and the PR addresses its specific bottleneck and the bottleneck happens to be different than in more common scenarios?
Original timing
Project | develop |
this PR |
---|---|---|
brink | 5 s | 5 s |
colony | 111 s | 88 s |
elementfi | 169 s | 195 s |
ens | 46 s | 48 s |
euler | ||
gnosis | ||
gp2 | 44 s | 44 s |
perpetual-pools | 88 s | 93 s |
pool-together | ||
uniswap | 107 s | 100 s |
yield_liquidator | 31 s | 39 s |
zeppelin | 408 s | 347 s |
prb-math | 13 s | 11 s |
Current timing
Project | develop |
this PR (run 1) | this PR (run 2) | this PR (run 3) |
---|---|---|---|---|
brink | 5 s | 4 s | 5 s | 5 s |
colony | 105 s | 113 s | 110 s | 114 s |
elementfi | 172 s | 162 s | 163 s | 167 s |
ens | 40 s | 43 s | 43 s | 47 s |
euler | 52 s | 59 s | 62 s | 72 s |
gnosis | ||||
gp2 | 42 s | 48 s | 47 s | 60 s |
perpetual-pools | 81 s | 72 s | 84 s | 79 s |
pool-together | 56 s | 49 s | 51 s | 61 s |
uniswap | 82 s | 84 s | 71 s | 84 s |
yield_liquidator | 26 s | 27 s | 26 s | 25 s |
zeppelin | 261 s | 266 s | 274 s | 277 s |
prb-math |
-
run on
develop
-
run 1 on
dataflow-analyzer-reverse-lookup
-
run 2 on
dataflow-analyzer-reverse-lookup
-
run 3 on
dataflow-analyzer-reverse-lookup
Timing diff with develop
Project | Diff (original) | Diff (run 1) | Diff (run 2) | Diff (run 3) |
---|---|---|---|---|
brink | 1 s | -1 s | 0 s | 0 s |
colony | -24 s | 8 s | 5 s | 9 s |
elementfi | 27 s | -10 s | -9 s | -5 s |
ens | 3 s | 3 s | 3 s | 7 s |
euler | 7 s | 10 s | 20 s | |
gnosis | ||||
gp2 | -1 s | 6 s | 5 s | 18 s |
perpetual-pools | 4 s | -9 s | 3 s | -2 s |
pool-together | -7 s | -5 s | 5 s | |
uniswap | -6 s | 2 s | -11 s | 2 s |
yield_liquidator | 7 s | 1 s | 0 s | -1 s |
zeppelin | -62 s | 5 s | 13 s | 16 s |
prb-math | -3 s |
Timing benchmark
develop
Binary from b_ubu_static
from the last run on develop
File | Pipeline | Bytecode size | Time | Exit code |
---|---|---|---|---|
verifier.sol |
legacy | 4940 bytes | 0.20 s | 0 |
verifier.sol |
via-ir | 4417 bytes | 0.66 s | 0 |
OptimizorClub.sol |
legacy | 0 bytes | 0.52 s | 1 |
OptimizorClub.sol |
via-ir | 22391 bytes | 4.05 s | 0 |
chains.sol |
legacy | 5878 bytes | 0.17 s | 0 |
chains.sol |
via-ir | 23076 bytes | 23.16 s | 0 |
this PR
Binary from b_ubu_static
from the last run on dataflow-analyzer-reverse-lookup
File | Pipeline | Bytecode size | Time | Exit code |
---|---|---|---|---|
verifier.sol |
legacy | 4940 bytes | 0.14 s | 0 |
verifier.sol |
via-ir | 4417 bytes | 0.65 s | 0 |
OptimizorClub.sol |
legacy | 0 bytes | 0.57 s | 1 |
OptimizorClub.sol |
via-ir | 22391 bytes | 4.23 s | 0 |
chains.sol |
legacy | 5878 bytes | 0.17 s | 0 |
chains.sol |
via-ir | 23076 bytes | 10.52 s | 0 |
This PR has been open for more than a year, with the improvements somewhat inconclusive. I'll be closing it, at least for the time being, since we're working on different (macro) improvements on the via-ir pipeline, which should yield significantly better results. We can re-open this in the future if need be, or use it as a reference at the very least.