lean-mlir icon indicating copy to clipboard operation
lean-mlir copied to clipboard

(WIP) Evaluate generalizer

Open oluwatimilehin opened this issue 7 months ago • 5 comments

oluwatimilehin avatar May 05 '25 18:05 oluwatimilehin

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 05 '25 18:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 05 '25 22:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 05 '25 23:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 06 '25 14:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 06 '25 14:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 07 '25 10:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 07 '25 11:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 07 '25 17:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 07 '25 18:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 07 '25 23:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 08 '25 09:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 09 '25 00:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 10 '25 11:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 10 '25 14:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 10 '25 22:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 11 '25 00:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 12 '25 08:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 13 '25 09:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 17 '25 19:05 github-actions[bot]

bitwuzla proved and leanSAT failed theorem 3 in file gexact_proof.lean bitwuzla proved and leanSAT failed theorem 3 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 5 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 51 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 53 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 0 in file gsremhcanonicalize_proof.lean bitwuzla proved and leanSAT failed theorem 3 in file gdistribute_proof.lean bitwuzla proved and leanSAT failed theorem 4 in file gdistribute_proof.lean bitwuzla proved and leanSAT failed theorem 9 in file gapinthsub_proof.lean bitwuzla provided counterexample and leanSAT failed theorem 1 in file gapinthrem2_proof.lean bitwuzla proved and leanSAT failed theorem 15 in file grem_proof.lean bitwuzla proved and leanSAT failed theorem 38 in file grem_proof.lean bitwuzla proved and leanSAT failed theorem 2 in file gsdivhcanonicalize_proof.lean bitwuzla proved and leanSAT failed theorem 1 in file gadd4_proof.lean bitwuzla proved and leanSAT failed theorem 0 in file g2008h02h23hMulSub_proof.lean bitwuzla proved and leanSAT failed theorem 0 in file g2010h11h23hDistributed_proof.lean bitwuzla provided counterexample and leanSAT failed theorem 3 in file gapinthdiv2_proof.lean bitwuzla proved and leanSAT failed theorem 4 in file gadd_or_sub_proof.lean bitwuzla proved and leanSAT failed theorem 14 in file gshlhfactor_proof.lean bitwuzla proved and leanSAT failed theorem 21 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 22 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 23 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 24 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 13 in file gsubhxorhcmp_proof.lean leanSAT and Bitwuzla solved: 7929 leanSAT and Bitwuzla provided 8 counterexamples leanSAT and Bitwuzla both failed on 15 theorems leanSAT failed and Bitwuzla succeeded on 22 theorems leanSAT succeeded and Bitwuzla failed on 0 theorems There were 0 inconsistencies Errors raised: 79 ran rg 'Bitwuzla failed' | wc -l, expected 15, found 0, FAIL ran rg 'LeanSAT failed' | wc -l, expected 37, found 0, FAIL ran rg 'LeanSAT provided a counter' | wc -l, expected 8, found 0, FAIL ran rg 'Bitwuzla provided a counter' | wc -l, expected 8, found 0, FAIL ran rg 'LeanSAT proved' | wc -l, expected 7929, found 0, FAIL ran rg 'Bitwuzla proved' | wc -l, expected 7951, found 0, FAIL error (kernel) deep recursion detected was raised 25 times error The SAT solver timed out while solving the problem. was raised 39 times error The SMT solver timed out while solving the problem. was raised 15 times

github-actions[bot] avatar May 17 '25 19:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 19 '25 10:05 github-actions[bot]

Alive Statistics: 90 / 93 (3 failed)

github-actions[bot] avatar May 22 '25 10:05 github-actions[bot]

bitwuzla proved and leanSAT failed theorem 3 in file gexact_proof.lean bitwuzla proved and leanSAT failed theorem 3 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 5 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 51 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 53 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 3 in file gdistribute_proof.lean bitwuzla proved and leanSAT failed theorem 4 in file gdistribute_proof.lean bitwuzla proved and leanSAT failed theorem 9 in file gapinthsub_proof.lean bitwuzla provided counterexample and leanSAT failed theorem 1 in file gapinthrem2_proof.lean bitwuzla proved and leanSAT failed theorem 15 in file grem_proof.lean bitwuzla proved and leanSAT failed theorem 38 in file grem_proof.lean bitwuzla proved and leanSAT failed theorem 2 in file gsdivhcanonicalize_proof.lean bitwuzla proved and leanSAT failed theorem 1 in file gadd4_proof.lean bitwuzla proved and leanSAT failed theorem 0 in file g2008h02h23hMulSub_proof.lean bitwuzla proved and leanSAT failed theorem 0 in file g2010h11h23hDistributed_proof.lean bitwuzla provided counterexample and leanSAT failed theorem 3 in file gapinthdiv2_proof.lean bitwuzla proved and leanSAT failed theorem 14 in file gshlhfactor_proof.lean bitwuzla proved and leanSAT failed theorem 21 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 22 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 23 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 24 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 13 in file gsubhxorhcmp_proof.lean leanSAT and Bitwuzla solved: 7931 leanSAT and Bitwuzla provided 8 counterexamples leanSAT and Bitwuzla both failed on 15 theorems leanSAT failed and Bitwuzla succeeded on 20 theorems leanSAT succeeded and Bitwuzla failed on 0 theorems There were 0 inconsistencies Errors raised: 77 ran rg 'Bitwuzla failed' | wc -l, expected 15, found 0, FAIL ran rg 'LeanSAT failed' | wc -l, expected 35, found 0, FAIL ran rg 'LeanSAT provided a counter' | wc -l, expected 8, found 0, FAIL ran rg 'Bitwuzla provided a counter' | wc -l, expected 8, found 0, FAIL ran rg 'LeanSAT proved' | wc -l, expected 7931, found 0, FAIL ran rg 'Bitwuzla proved' | wc -l, expected 7951, found 0, FAIL error (kernel) deep recursion detected was raised 25 times error The SAT solver timed out while solving the problem. was raised 37 times error The SMT solver timed out while solving the problem. was raised 15 times

github-actions[bot] avatar May 22 '25 10:05 github-actions[bot]

@oluwatimilehin Thank you for working on the evaluation! Do you feel that the code is in a mergeable state now? If so, please do merge it, and work on the next improvement in a separate PR. Generally, long-running PRs become hard to merge and to review, so I'd suggest stopping this one at a logical point, merging it in, and then continuing work in another PR.

bollu avatar May 22 '25 11:05 bollu

@oluwatimilehin aha, my bad! I meant that we should change the status to "ready for review", to then ask for review from one of us :)

bollu avatar May 22 '25 12:05 bollu

@oluwatimilehin I will do a review now, post-commit, so we can cleanup the PR. In general, I think we should submit a PR that cleans up the implementation in the following ways:

  1. We make sure that all our functions are commented.
  2. We clearly explain each sorry.
  3. We remove all commented / dead code.
  4. We guard every #eval or #generalize with guard_msgs. May I have a PR titled "cleanup generalizer" that does this?

bollu avatar May 22 '25 12:05 bollu