Metatheory.jl icon indicating copy to clipboard operation
Metatheory.jl copied to clipboard

Change implementation of BackoffScheduler to match egg.

Open gkronber opened this issue 1 year ago • 7 comments

For valid performance comparisons, with egg we should produce egraphs that have comparable sizes to the egraphs produced by egg. Currently, this is not the case because MT has a bug when using BackoffScheduler (informed with incorrect number of matches). Additionally, the implementation of BackoffScheduler is different to egg, leading to different egraph sizes (for the same rules, and saturation timeout).

egg allows to limit the number of matches in the ematcher to the threshold calculated from the BackoffScheduler which has additional performance advantages.

Fixes #248.

gkronber avatar Oct 05 '24 13:10 gkronber

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 77.65957% with 21 lines in your changes missing coverage. Please review.

Project coverage is 80.26%. Comparing base (081a9e6) to head (bfe573e). Report is 2 commits behind head on ale/3.0.

Files with missing lines Patch % Lines
src/EGraphs/Schedulers.jl 70.83% 21 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           ale/3.0     #249      +/-   ##
===========================================
- Coverage    81.17%   80.26%   -0.92%     
===========================================
  Files           19       18       -1     
  Lines         1503     1535      +32     
===========================================
+ Hits          1220     1232      +12     
- Misses         283      303      +20     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Oct 05 '24 13:10 codecov-commenter

Benchmark Results

egg-sym egg-cust MT@4530aaea79e... MT@d93e4fe4a64... egg-sym/MT@453... egg-cust/MT@45... MT@d93e4fe4a64...
egraph_addexpr 1.45 ms 5.13 ms 5.26 ms 0.282 1.03
basic_maths_simpl2 13.8 ms 5.13 ms 14.7 ms 20.9 ms 0.938 0.349 1.42
prop_logic_freges_theorem 2.55 ms 1.56 ms 1.28 ms 1.06 ms 1.98 1.21 0.824
calc_logic_demorgan 60.9 μs 34.2 μs 90.5 μs 76.8 μs 0.673 0.378 0.849
calc_logic_freges_theorem 22.5 ms 11.6 ms 78.4 ms 46.7 ms 0.287 0.148 0.596
basic_maths_simpl1 6.42 ms 2.87 ms 12.9 ms 4.72 ms 0.499 0.223 0.367
egraph_constructor 0.0846 μs 0.0935 μs 0.0969 μs 0.905 1.04
prop_logic_prove1 36.5 ms 14.2 ms 89.6 ms 42.5 ms 0.407 0.159 0.474
prop_logic_demorgan 79.9 μs 45.4 μs 111 μs 93.9 μs 0.721 0.41 0.847
while_superinterpreter_while_10 19.1 ms 18.3 ms 0.96
prop_logic_rewrite 122 μs 122 μs 0.996
time_to_load 115 ms 118 ms 1.03

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

github-actions[bot] avatar Oct 05 '24 13:10 github-actions[bot]

Explanation for the longer runtimes in the benchmark: before the fix the BackoffScheduler quickly blocked all rules because it was informed with the sum of number of matches over all rules. In the fixed version, the BackoffScheduler blocks rules later (and individually), which leads to larger graph and therefore longer runtime.

We should compare the sizes of the egraphs for all implementations in the benchmark. When the scheduler parameters are the same and the rules are the same then the egraphs (using egg and MT) should be the same after saturation. Otherwise comparing the runtimes is not particularly useful.

I changed the egg-benchmark and MT benchmark script below to produce egraph sizes.

gkronber avatar Oct 05 '24 14:10 gkronber

Further investigation shows that the implementation of BackoffScheduler in MT differs significantly from the implementation in egg.

In egg, the search_rewrite function of schedulers determines how matches for rule are searched https://github.com/egraphs-good/egg/blob/1b2d004f63a01256047154f51568e61317cd4e89/src/run.rs#L935 which gives much more flexibility for scheduling than the implementation in MT, where eqsat_search! communicates with schedulers solely via cansearch and inform! . An important performance feature is that BackoffScheduler in egg uses the threshold to limit the number of matches in the ematcher.

It is easy to add something similar to MT but requires a change to the interface for AbstractScheduler. I see two options:

  • We could change cansearch to a function returning the limit for the number of matches (matchlimit) and leave the structure of searching over eclasses as is. (as in 7dd58487eb91916c9e60d51f45722a797ad7c94c).
  • We could implement search_rewrite similarly to egg, which opens up capabilities to implement smarter schedulers. (as in 0fecaa569c4ec4badcd34fcf79ec3e8576569d11). I prefer this solution.

gkronber avatar Oct 06 '24 07:10 gkronber

For future reference, egg-benchmark produces the following numbers for egg-sym and the numbers for MT 0fecaa569c4ec4badcd34fcf79ec3e8576569d11 are

id egg_n_classes egg_n_memo mt_n_classes mt_n_memo note
egraph/addexpr 6771 6771 6650 6650
basic_maths/simpl1 368 2543 635 4351 -
basic_maths/simpl2 440 2836 726 3536 -
calc_logic/demorgan 16 35 12 25 sum over prove steps
calc_logic/freges_theorem 1072 17394 713 14692 sum over prove steps
prop_logic/demorgan 16 42 15 35 sum over prove steps
prop_logic/freges_theorem 316 2315 71 255 sum over prove steps
prop_logic/prove1 7448 35210 1947 16540 sum over prove steps

Differences in the number of eclasses produced by egg and MT to be investigated.

gkronber avatar Oct 07 '24 12:10 gkronber

@gkronber thanks for all these contributions! :heart_hands:

I will take a look after work

0x0f0f0f avatar Oct 11 '24 10:10 0x0f0f0f

This PR is a bit messy, because I became aware of an issue in the ematch compiler and the benchmarking code after starting to work on the issue I observed initially.

In the meanwhile I strongly support moving the matching code into the Schedulers (refactoring cansearch, inform). I believe it would be important to fix this and introduce the match limit parameter for the ematchers.

I'm happy to split this PR up into several smaller cleaner PRs if this makes it easier for reviewing.

gkronber avatar Oct 14 '24 09:10 gkronber

Ok, it seems that #252 did not really fix issues with the CAS integration tests. More matches are detected because rules are not disabled all together by the backoffscheduler anymore. The additional matches and unifications mean that the issues in the CAS tests pop up again.

The CAS test has several rules which seem problematic with divison by zero, I suspect that these are the problem. We might need to add semantic analysis to finally fix them. @0x0f0f0f is there source for the CAS tests and rules that we could compare to? Edit: I found this https://github.com/egraphs-good/egg/blob/main/tests/math.rs which seems similar.

gkronber avatar Oct 24 '24 14:10 gkronber

@0x0f0f0f if you agree, I'm going to close this PR and open a new PR and branch that only contains the changes for the schedulers and should be easier to review.

gkronber avatar Jan 20 '25 11:01 gkronber

@0x0f0f0f if you agree, I'm going to close this PR and open a new PR and branch that only contains the changes for the schedulers and should be easier to review.

Sure please go ahead. I'm sorry for the very late reply. I am catching up now. There were some design choices behind why I deliberately chose to adopt a different interface, one was performance.

Currently ale/3.0 branch is failing and will push the fix today.

0x0f0f0f avatar Jan 20 '25 11:01 0x0f0f0f

Closing this PR as we will handle the changes in separate smaller PRs

gkronber avatar Jan 23 '25 08:01 gkronber