Metatheory.jl icon indicating copy to clipboard operation
Metatheory.jl copied to clipboard

Update benchmarks to report the egraph size for each case

Open gkronber opened this issue 9 months ago • 9 comments

... and check he return value assertion only once.

This is useful to compare the behaviour of MT to egg additionally to the runtime.

gkronber avatar Feb 18 '25 13:02 gkronber

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 81.42%. Comparing base (32786c0) to head (641212a).

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff            @@
##           ale/3.0     #265   +/-   ##
========================================
  Coverage    81.42%   81.42%           
========================================
  Files           19       19           
  Lines         1491     1491           
========================================
  Hits          1214     1214           
  Misses         277      277           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Feb 18 '25 14:02 codecov-commenter

Benchmark Results

egg-sym egg-cust MT@641212a8e55... MT@32786c06710... egg-sym/MT@641... egg-cust/MT@64... MT@32786c06710...
egraph_addexpr 1.45 ms 6.16 ms 6.33 ms 0.236 1.03
basic_maths_simpl2 14 ms 5.16 ms 25.8 ms 25.6 ms 0.543 0.2 0.992
prop_logic_freges_theorem 2.55 ms 1.56 ms 2.43 ms 2.41 ms 1.05 0.642 0.991
calc_logic_demorgan 60.8 μs 34.4 μs 81.2 μs 80.2 μs 0.748 0.423 0.988
calc_logic_freges_theorem 24 ms 11.6 ms 49 ms 47.3 ms 0.49 0.237 0.964
basic_maths_simpl1 6.51 ms 2.89 ms 4.93 ms 4.89 ms 1.32 0.587 0.992
egraph_constructor 0.0846 μs 0.106 μs 0.11 μs 0.8 1.04
prop_logic_prove1 37.5 ms 14.6 ms 45.4 ms 43 ms 0.827 0.321 0.949
prop_logic_demorgan 79.9 μs 45.5 μs 98.8 μs 95.5 μs 0.809 0.46 0.967
while_superinterpreter_while_10 19.1 ms 19.9 ms 1.04
prop_logic_rewrite 117 μs 114 μs 0.973
time_to_load 132 ms 128 ms 0.971

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

github-actions[bot] avatar Feb 18 '25 15:02 github-actions[bot]

Using the updated scripts from https://github.com/nmheim/egg-benchmark/pull/10 this should also produce the egraph size. Probably, we still need to update the CI script to extract the second table.

It should look similar to the following. Values in the second table are (num_classes, num_nodes, length(memo)). Edit: updated for 641212a (after fixing inconsistencies in the benchmarks)

|                                 | egg-sym | egg-cust | MT@641212a | MT@ale/3.0 | egg-sym/MT@641... | egg-cust/MT@64... | MT@ale/3.0/MT@... |
|:--------------------------------|:-------:|:--------:|:----------:|:----------:|:-----------------:|:-----------------:|:-----------------:|
| egraph_addexpr                  | 1.62 ms |          | 5.38 ms    | 5.05 ms    | 0.3               |                   | 0.937             |
| basic_maths_simpl2              | 12 ms   | 5.05 ms  | 27.5 ms    | 24.7 ms    | 0.438             | 0.183             | 0.898             |
| prop_logic_freges_theorem       | 2.56 ms | 1.61 ms  | 2.42 ms    | 2.31 ms    | 1.06              | 0.664             | 0.955             |
| calc_logic_demorgan             | 64.1 μs | 36.8 μs  | 80.2 μs    | 78.8 μs    | 0.799             | 0.458             | 0.983             |
| calc_logic_freges_theorem       | 23 ms   | 11.7 ms  | 44.3 ms    | 42.3 ms    | 0.518             | 0.263             | 0.953             |
| basic_maths_simpl1              | 5.77 ms | 2.9 ms   | 4.99 ms    | 4.6 ms     | 1.16              | 0.582             | 0.922             |
| egraph_constructor              | 0.13 μs |          | 0.1 μs     | 0.1 μs     | 1.3               |                   | 1                 |
| prop_logic_prove1               | 38 ms   | 14.6 ms  | 44.7 ms    | 39 ms      | 0.851             | 0.327             | 0.873             |
| prop_logic_demorgan             | 89.9 μs | 49.7 μs  | 98.8 μs    | 97 μs      | 0.91              | 0.503             | 0.982             |
| while_superinterpreter_while_10 |         |          | 19.7 ms    | 18.8 ms    |                   |                   | 0.954             |
| prop_logic_rewrite              |         |          | 131 μs     | 116 μs     |                   |                   | 0.891             |
| time_to_load                    |         |          | 121 ms     | 111 ms     |                   |                   | 0.92              |

[ Info: Saving table at output.md
|                                 | egg-sym          | egg-cust         | MT@641212a       | MT@ale/3.0       | egg-sym/MT@641... | egg-cust/MT@64... | MT@ale/3.0/MT@... |
|:--------------------------------|:----------------:|:----------------:|:----------------:|:----------------:|:-----------------:|:-----------------:|:-----------------:|
| egraph_addexpr                  | 6762 6762 6762   |                  | 6663 6663 6663   | 6640 6640 6640   |                   |                   |                   |
| basic_maths_simpl2              | 440 2235 2836    | 440 2235 2839    | 7092 14530 15001 | 7092 14530 15001 |                   |                   |                   |
| prop_logic_freges_theorem       | 316 1197 2315    | 316 1197 2322    | 297 899 1313     | 297 899 1313     |                   |                   |                   |
| calc_logic_demorgan             | 16 33 35         | 16 33 35         | 15 30 32         | 15 30 32         |                   |                   |                   |
| calc_logic_freges_theorem       | 1072 4289 17394  | 1072 4289 17280  | 1527 5402 18092  | 1527 5402 18092  |                   |                   |                   |
| basic_maths_simpl1              | 368 1910 2543    | 368 1910 2567    | 528 1815 3042    | 528 1815 3042    |                   |                   |                   |
| egraph_constructor              |                  |                  |                  |                  |                   |                   |                   |
| prop_logic_prove1               | 5510 17371 27976 | 4668 13644 18522 | 3061 9176 15637  | 3061 9176 15637  |                   |                   |                   |
| prop_logic_demorgan             | 16 35 42         | 16 35 42         | 15 30 33         | 15 30 33         |                   |                   |                   |
| while_superinterpreter_while_10 |                  |                  | 39 205 499       | 39 205 499       |                   |                   |                   |
| prop_logic_rewrite              |                  |                  |                  |                  |                   |                   |                   |
| time_to_load                    |                  |                  |                  |                  |                   |                   |                   |

~One issue that I couldn't resolve yet, is that for three benchmarks we get a reported size for ale/3.0. This cannot be correct.~ Edit: misunderstanding on my side, we are using simplify from the new benchmark.jl script, and therefore also get size results for other branches.

~I also need to double check whether the hyperparameters and benchmark implementation for egg and MT are equal.~ Edit: done

gkronber avatar Feb 18 '25 16:02 gkronber

Nice, thanks a lot! Can we keep the old prove interface by renaming the function that returns also the egraph to something like saturate_prove, then have prove to check if the condition is met and return just the boolean? Also, for the changes to be picked up by the Github CI, we need to merge the benchmarks.jl file into master. I will do it after 3.0 merge

0x0f0f0f avatar Feb 18 '25 21:02 0x0f0f0f

Can we keep the old prove interface by renaming the function that returns also the egraph to something like saturate_prove, then have prove to check if the condition is met and return just the boolean?

Yes, separating the two prove functions (for tests and benchmarks) is better because then we can always use the prove function from benchmarks.jl to calculate sizes even for other branches.

gkronber avatar Feb 19 '25 07:02 gkronber

Can we keep the old prove interface by renaming the function that returns also the egraph to something like saturate_prove, then have prove to check if the condition is met and return just the boolean?

Yes, separating the two prove functions (for tests and benchmarks) is better because then we can always use the prove function from benchmarks.jl to calculate sizes even for other branches.

I think they should call each other (prove should call prove_saturate and return the bool)

0x0f0f0f avatar Feb 19 '25 07:02 0x0f0f0f

I think they should call each other (prove should call prove_saturate and return the bool)

I tried at first, but it is problematic because the benchmarks.jl script is taken from the reference revision and examples/prove.jl is taken from the benchmarked branch. I think it is better to say that benchmarks.jl has its own prove which is then the same for all branches. (We would also need to move EGraphSize and its methods into examples/prove.jl.)

gkronber avatar Feb 19 '25 08:02 gkronber

@gkronber is this ready to merge?

0x0f0f0f avatar Feb 24 '25 19:02 0x0f0f0f

@0x0f0f0f yes

gkronber avatar Feb 25 '25 07:02 gkronber