Add Hermes' Canonicalization Optimization Pass

Open sebholl opened this issue 2 years ago • 1 comments

Summary

In response to the discussion with @tmikov around https://github.com/facebook/hermes/pull/1071, this PR introduces a brand new canonicalization step to the optimizer. The purpose of the canonicalization step is to standardize the syntax for code with the same semantics.

In the first implementation, we perform three basic canonicalization steps:

We move the constant/literal operator evaluations from InstSimplify to InstCanonicalize. I felt this important in the canonicalization pass as it normalizes the various ways that minifiers represent literals into their unobfiscated form (e.g. void 0 becomes undefined; !0 becomes true and !1 becomes false).
We flip constant/literal expressions to the right-hand side of binary expressions. This opens up more opportunity for CSE. Note, we could flip literals to the LHS but empirically this lead to worse optimization outcomes.
We normalize all nulls in != and == operators to undefined (one of the optimization opportunities originating in https://github.com/facebook/hermes/pull/1071). Again this could have equally been flipped to map undefined to nulls but as documented in that PR, null lead to worse optimization outcomes.

As for the pipeline, I propose running the canonicalization step before every InstSimplify run as well as before the CSE step. This has yielded promising results so far. Introducing these extra passes has increased compilation time by about 2% but from the conversation with @tmikov, it appears this is acceptable.

Clean-Up

For BinaryOperatorInst::tryGetReverseOperator(...), we remove + from the list of reversible operators as I ran into the pitfall of + operating in "concatenation" mode if one of its runtime operands is a string which clearly isn't "reversible". This change is assumed safe as it had no existing callers in the Hermes repo. I don't know if this function is linked to from other libraries, but we can land the PR without this change if preferred.
As (hopefully) expected from canonicalization approach 2, we can now simplify InstSimplify::simplifyBinOp(...) pattern matching for (x | 0) and (x | "") cases to eliminate their symmetric checsk against (0 | x) and ("" | x).

Test Plan

We trial different variations of the implementations on a 33MB production app React Native JS bundle and compare it to the base case of "No Change"/"No Change" with the full -O optimization process:

Binary Operator Literals	`==`/`!=` Null/Undefined	`-dump-bytecode` Line Count	`-dump-lir` Line Count	`-emit-binary` Size (bytes)	Execution Time (s)†
To The Right	Always Undefined	3,938,853	2,726,204	20,121,992	308.28
No Change	Always Undefined	3,939,077	2,726,205	20,127,753	307.84
To The Left	Always Undefined	3,939,233	2,726,251	20,128,289	306.27
To The Right	No Change	3,943,062	2,729,947	20,133,420	306.21
No Change	No Change	3,943,299	2,729,948	20,139,284	303.03
To The Left	No Change	3,943,455	2,729,994	20,139,812	307.21

†Execution time is measured on an M2 MacBook Air with all processes running in parallel on separate cores.

In all observed cases, swapping literals to the right always resulted in better optimization.

_Tangent: Curiously, swapping literals to the left resulted in consistently worse performance across all metrics (actually worse than doing no normalization). It seems to be not applying the AddString simplification optimizations.

Finally, we introduce a new canonicalize.js test file and run...

% cmake --build ../build --target check-hermes all -j 4 
[25/26] Running the Hermes regression tests
Testing Time: 20.31s
  Expected Passes    : 1682
  Unsupported Tests  : 62

Jul 30 '23 20:07 sebholl

@tmikov - any comment on these PRs?

Aug 09 '23 01:08 shollington-rbi