solidity Reverse lookup for DataFlowAnalyzer

Part of https://github.com/ethereum/solidity/issues/13719 and https://github.com/ethereum/solidity/issues/13822.

Benchmarking is run on the contract.

Mostly helps with CSE (CommonSubexpressionEleminator) and LiteralRematerialiser, LoadResolver and ExpressionSimplifier, i.e. steps that inherit from and thus rely on the DataFlowAnalyzer. The results are as follows:

This PR is essentially a rework of @chriseth's Improve DataFlowAnalyzer PR, which includes many other improvements (of which I think all were merged already in separate PRs). The question here is whether we'd prefer the approach here (where the in-order and reverse lookup are wrapped in a separate class), or in Chris' PR, where the reverse lookup is just added to the DataFlowAnalyzer::State object and then used directly from there.

develop (baseline)

======================================
  0.000% (8e-06 s): FunctionGrouper
  0.001% (1.4e-05 s): VarDeclInitializer
  0.001% (2.3e-05 s): ForLoopInitRewriter
  0.003% (8.1e-05 s): FunctionHoister
  0.017% (0.000421 s): ExpressionInliner
  0.035% (0.00085 s): UnusedFunctionParameterPruner
  0.075% (0.001829 s): ForLoopConditionIntoBody
  0.077% (0.001873 s): ForLoopConditionOutOfBody
  0.082% (0.002001 s): SSAReverser
  0.093% (0.002263 s): StructuralSimplifier
  0.135% (0.003282 s): BlockFlattener
  0.148% (0.003606 s): CircularReferencesPruner
  0.223% (0.00542 s): ExpressionJoiner
  0.235% (0.005713 s): EquivalentFunctionCombiner
  0.313% (0.007626 s): Rematerialiser
  0.335% (0.008165 s): LoopInvariantCodeMotion
  0.561% (0.013661 s): FunctionSpecializer
  0.845% (0.020568 s): ExpressionSplitter
  0.902% (0.021972 s): ConditionalUnsimplifier
  0.961% (0.023401 s): ConditionalSimplifier
  1.059% (0.025783 s): ControlFlowSimplifier
  1.072% (0.026108 s): DeadCodeEliminator
  1.551% (0.037774 s): EqualStoreEliminator
  3.646% (0.088806 s): FullInliner
  3.853% (0.09384 s): UnusedStoreEliminator
  4.819% (0.117355 s): UnusedPruner
  7.358% (0.179209 s): SSATransform
  7.365% (0.179365 s): UnusedAssignEliminator
  9.883% (0.240684 s): ExpressionSimplifier
 11.218% (0.273217 s): LoadResolver
 16.676% (0.40613 s): LiteralRematerialiser
 26.459% (0.644396 s): CommonSubexpressionEliminator
--------------------------------------
    100% (2.435 s)

with reverse lookup (this)

Performance metrics of optimizer steps
======================================
  0.001% (1e-05 s): FunctionGrouper
  0.001% (2.1e-05 s): VarDeclInitializer
  0.002% (2.9e-05 s): ForLoopInitRewriter
  0.006% (9.3e-05 s): FunctionHoister
  0.026% (0.00041 s): ExpressionInliner
  0.053% (0.000846 s): UnusedFunctionParameterPruner
  0.113% (0.001798 s): ForLoopConditionIntoBody
  0.125% (0.001986 s): ForLoopConditionOutOfBody
  0.134% (0.002124 s): SSAReverser
  0.146% (0.002312 s): StructuralSimplifier
  0.183% (0.002902 s): BlockFlattener
  0.228% (0.003604 s): CircularReferencesPruner
  0.332% (0.005263 s): ExpressionJoiner
  0.336% (0.005319 s): Rematerialiser
  0.367% (0.005814 s): EquivalentFunctionCombiner
  0.514% (0.008142 s): LoopInvariantCodeMotion
  0.963% (0.015248 s): FunctionSpecializer
  1.211% (0.019182 s): EqualStoreEliminator
  1.309% (0.020744 s): ExpressionSplitter
  1.360% (0.021539 s): ConditionalUnsimplifier
  1.429% (0.022642 s): ConditionalSimplifier
  1.634% (0.02588 s): ControlFlowSimplifier
  1.656% (0.026226 s): DeadCodeEliminator
  5.166% (0.081833 s): FullInliner
  5.865% (0.092911 s): UnusedStoreEliminator
  7.376% (0.116846 s): UnusedPruner
  8.498% (0.134622 s): LoadResolver
  8.557% (0.135551 s): ExpressionSimplifier
  9.066% (0.143613 s): LiteralRematerialiser
 11.084% (0.175588 s): SSATransform
 11.256% (0.178307 s): UnusedAssignEliminator
 21.004% (0.332742 s): CommonSubexpressionEliminator
--------------------------------------
    100% (1.584 s)

Apr 12 '23 13:04 nikola-matic

Also, @chriseth, do you have a suggestion for the changelog entry?

Apr 24 '23 11:04 nikola-matic

Looks good! Please squash.

May 26 '23 09:05 chriseth

Looks good! Please squash.

Done!

May 26 '23 09:05 nikola-matic

I checked how this PR affects compilation times in external tests. Here are the times for ir-optimize-evm+yul preset from run 29920 on develop (8c7404f639b19a30d0e0ace31002bb48a7a6f5c4) vs run 29928 on dataflow-analyzer-reverse-lookup (a8cc9bde9b64d44e19b59ada18412675654e5d22).

test	`develop`	this PR	difference
zeppelin	6m48.256s	5m46.622s	-1:02
colony	1m51.012s	1m27.582s	-0:24
uniswap	1m46.737s	1m40.339s	-0:06
prb-math	0m13.031s	0m10.887s	-0:03
gp2	0m44.495s	0m43.621s	-0:01
brink	0m04.842s	0m05.302s	+0:01
ens	0m45.856s	0m48.436s	+0:03
perpetual-pools	1m28.025s	1m32.749s	+0:04
yield-liquidator	0m31.450s	0m38.634s	+0:07
elementfi	2m48.506s	3m15.299s	+0:27

I wonder what's happening with ElementFi. I double checked I got the numbers from the right pages and looks like it really took ~30 s longer. May be worth checking if it's repeatable or just a fluke. Still, some other times increased too so there might be something more to it.

The table contains only the real time. Full results are below in case anyone's interested.

zeppelin
real	6m48.256s user	6m47.482s sys	0m2.189s
real	5m46.622s user	5m46.191s sys	0m1.551s

uniswap
real	1m46.737s user	1m48.668s sys	0m0.732s
real	1m40.339s user	1m41.697s sys	0m0.673s

ens
real	0m45.856s user	0m45.927s sys	0m0.656s
real	0m48.436s user	0m48.450s sys	0m0.580s

gp2
real	0m44.495s user	0m52.945s sys	0m1.442s
real	0m43.621s user	0m51.973s sys	0m1.369s

brink
real	0m4.842s user	0m4.763s sys	0m0.603
real	0m5.302s user	0m5.272s sys	0m0.556s

elementfi
real	2m48.506s user	2m56.110s sys	0m2.225s
real	3m15.299s user	3m24.894s sys	0m2.584s

perpetual-pools
real	1m28.025s user	1m30.875s sys	0m1.107s
real	1m32.749s user	1m36.234s sys	0m1.284s

prb-math
real	0m13.031s user	0m13.804s sys	0m0.469s
real	0m10.887s user	0m11.300s sys	0m0.444s

yield-liquidator
real	0m31.450s user	0m34.304s sys	0m0.567s
real	0m38.634s user	0m41.636s sys	0m0.534s

colony
real	1m51.012s user	2m17.810s sys	0m3.484s
real	1m27.582s user	1m50.964s sys	0m2.876s

May 26 '23 17:05 cameel

This pull request is stale because it has been open for 14 days with no activity. It will be closed in 7 days unless the stale label is removed.

Jun 19 '23 12:06 github-actions[bot]

I gathered fresh timing data (and translated the original data into the same format).

It seems to me that all the timing differences we see in external tests, even the positive ones may be just noise:
- The differences are much smaller now. Especially for zeppelin, elementfi and colony, which originally had the largest difference, the difference not only decreased but completely reversed direction.
- In some cases the differences flip between positive and negative in different runs for the same project.
- zeppelin and uniswap compile much faster on develop now. In case of zeppelin this could very well be due to upstream changes in tests. For uniswap though we have a forked repo.
On the other hand, there is actually a 2x difference in our own timing benchmark - though only for a single contract (chains.sol). Others, e.g. OptimizorClub.sol, do not seem affected.

My conclusion here is that there's no strong evidence that this change significantly affects external tests, be it positively or negatively. If there is a difference, it's lower than the normal variance of CI timing. Still, the change does seem to help one especially pathological contract from our repo (chains.sol). I wonder if it might be because profiling was done specifically on that contract and the PR addresses its specific bottleneck and the bottleneck happens to be different than in more common scenarios?

Original timing

Project	`develop`	this PR
brink	5 s	5 s
colony	111 s	88 s
elementfi	169 s	195 s
ens	46 s	48 s
euler
gnosis
gp2	44 s	44 s
perpetual-pools	88 s	93 s
pool-together
uniswap	107 s	100 s
yield_liquidator	31 s	39 s
zeppelin	408 s	347 s
prb-math	13 s	11 s

Current timing

Project	`develop`	this PR (run 1)	this PR (run 2)	this PR (run 3)
brink	5 s	4 s	5 s	5 s
colony	105 s	113 s	110 s	114 s
elementfi	172 s	162 s	163 s	167 s
ens	40 s	43 s	43 s	47 s
euler	52 s	59 s	62 s	72 s
gnosis
gp2	42 s	48 s	47 s	60 s
perpetual-pools	81 s	72 s	84 s	79 s
pool-together	56 s	49 s	51 s	61 s
uniswap	82 s	84 s	71 s	84 s
yield_liquidator	26 s	27 s	26 s	25 s
zeppelin	261 s	266 s	274 s	277 s
prb-math

Timing diff with `develop`

Project	Diff (original)	Diff (run 1)	Diff (run 2)	Diff (run 3)
brink	1 s	-1 s	0 s	0 s
colony	-24 s	8 s	5 s	9 s
elementfi	27 s	-10 s	-9 s	-5 s
ens	3 s	3 s	3 s	7 s
euler		7 s	10 s	20 s
gnosis
gp2	-1 s	6 s	5 s	18 s
perpetual-pools	4 s	-9 s	3 s	-2 s
pool-together		-7 s	-5 s	5 s
uniswap	-6 s	2 s	-11 s	2 s
yield_liquidator	7 s	1 s	0 s	-1 s
zeppelin	-62 s	5 s	13 s	16 s
prb-math	-3 s

Timing benchmark

`develop`

Binary from b_ubu_static from the last run on develop

File	Pipeline	Bytecode size	Time	Exit code
`verifier.sol`	legacy	4940 bytes	0.20 s	0
`verifier.sol`	via-ir	4417 bytes	0.66 s	0
`OptimizorClub.sol`	legacy	0 bytes	0.52 s	1
`OptimizorClub.sol`	via-ir	22391 bytes	4.05 s	0
`chains.sol`	legacy	5878 bytes	0.17 s	0
`chains.sol`	via-ir	23076 bytes	23.16 s	0

this PR

Binary from b_ubu_static from the last run on dataflow-analyzer-reverse-lookup

File	Pipeline	Bytecode size	Time	Exit code
`verifier.sol`	legacy	4940 bytes	0.14 s	0
`verifier.sol`	via-ir	4417 bytes	0.65 s	0
`OptimizorClub.sol`	legacy	0 bytes	0.57 s	1
`OptimizorClub.sol`	via-ir	22391 bytes	4.23 s	0
`chains.sol`	legacy	5878 bytes	0.17 s	0
`chains.sol`	via-ir	23076 bytes	10.52 s	0

Apr 19 '24 20:04 cameel

This PR has been open for more than a year, with the improvements somewhat inconclusive. I'll be closing it, at least for the time being, since we're working on different (macro) improvements on the via-ir pipeline, which should yield significantly better results. We can re-open this in the future if need be, or use it as a reference at the very least.

Aug 21 '24 11:08 nikola-matic

solidity solidity copied to clipboard

Reverse lookup for DataFlowAnalyzer

develop (baseline)

with reverse lookup (this)

Original timing

Current timing

Timing diff with develop

Timing benchmark

develop

this PR

solidity
solidity copied to clipboard

Timing diff with `develop`

`develop`