(WIP) Evaluate generalizer
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
bitwuzla proved and leanSAT failed theorem 3 in file gexact_proof.lean bitwuzla proved and leanSAT failed theorem 3 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 5 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 51 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 53 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 0 in file gsremhcanonicalize_proof.lean bitwuzla proved and leanSAT failed theorem 3 in file gdistribute_proof.lean bitwuzla proved and leanSAT failed theorem 4 in file gdistribute_proof.lean bitwuzla proved and leanSAT failed theorem 9 in file gapinthsub_proof.lean bitwuzla provided counterexample and leanSAT failed theorem 1 in file gapinthrem2_proof.lean bitwuzla proved and leanSAT failed theorem 15 in file grem_proof.lean bitwuzla proved and leanSAT failed theorem 38 in file grem_proof.lean bitwuzla proved and leanSAT failed theorem 2 in file gsdivhcanonicalize_proof.lean bitwuzla proved and leanSAT failed theorem 1 in file gadd4_proof.lean bitwuzla proved and leanSAT failed theorem 0 in file g2008h02h23hMulSub_proof.lean bitwuzla proved and leanSAT failed theorem 0 in file g2010h11h23hDistributed_proof.lean bitwuzla provided counterexample and leanSAT failed theorem 3 in file gapinthdiv2_proof.lean bitwuzla proved and leanSAT failed theorem 4 in file gadd_or_sub_proof.lean bitwuzla proved and leanSAT failed theorem 14 in file gshlhfactor_proof.lean bitwuzla proved and leanSAT failed theorem 21 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 22 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 23 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 24 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 13 in file gsubhxorhcmp_proof.lean leanSAT and Bitwuzla solved: 7929 leanSAT and Bitwuzla provided 8 counterexamples leanSAT and Bitwuzla both failed on 15 theorems leanSAT failed and Bitwuzla succeeded on 22 theorems leanSAT succeeded and Bitwuzla failed on 0 theorems There were 0 inconsistencies Errors raised: 79 ran rg 'Bitwuzla failed' | wc -l, expected 15, found 0, FAIL ran rg 'LeanSAT failed' | wc -l, expected 37, found 0, FAIL ran rg 'LeanSAT provided a counter' | wc -l, expected 8, found 0, FAIL ran rg 'Bitwuzla provided a counter' | wc -l, expected 8, found 0, FAIL ran rg 'LeanSAT proved' | wc -l, expected 7929, found 0, FAIL ran rg 'Bitwuzla proved' | wc -l, expected 7951, found 0, FAIL error (kernel) deep recursion detected was raised 25 times error The SAT solver timed out while solving the problem. was raised 39 times error The SMT solver timed out while solving the problem. was raised 15 times
Alive Statistics: 90 / 93 (3 failed)
Alive Statistics: 90 / 93 (3 failed)
bitwuzla proved and leanSAT failed theorem 3 in file gexact_proof.lean bitwuzla proved and leanSAT failed theorem 3 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 5 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 51 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 53 in file gdivhshift_proof.lean bitwuzla proved and leanSAT failed theorem 3 in file gdistribute_proof.lean bitwuzla proved and leanSAT failed theorem 4 in file gdistribute_proof.lean bitwuzla proved and leanSAT failed theorem 9 in file gapinthsub_proof.lean bitwuzla provided counterexample and leanSAT failed theorem 1 in file gapinthrem2_proof.lean bitwuzla proved and leanSAT failed theorem 15 in file grem_proof.lean bitwuzla proved and leanSAT failed theorem 38 in file grem_proof.lean bitwuzla proved and leanSAT failed theorem 2 in file gsdivhcanonicalize_proof.lean bitwuzla proved and leanSAT failed theorem 1 in file gadd4_proof.lean bitwuzla proved and leanSAT failed theorem 0 in file g2008h02h23hMulSub_proof.lean bitwuzla proved and leanSAT failed theorem 0 in file g2010h11h23hDistributed_proof.lean bitwuzla provided counterexample and leanSAT failed theorem 3 in file gapinthdiv2_proof.lean bitwuzla proved and leanSAT failed theorem 14 in file gshlhfactor_proof.lean bitwuzla proved and leanSAT failed theorem 21 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 22 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 23 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 24 in file greassociatehnuw_proof.lean bitwuzla proved and leanSAT failed theorem 13 in file gsubhxorhcmp_proof.lean leanSAT and Bitwuzla solved: 7931 leanSAT and Bitwuzla provided 8 counterexamples leanSAT and Bitwuzla both failed on 15 theorems leanSAT failed and Bitwuzla succeeded on 20 theorems leanSAT succeeded and Bitwuzla failed on 0 theorems There were 0 inconsistencies Errors raised: 77 ran rg 'Bitwuzla failed' | wc -l, expected 15, found 0, FAIL ran rg 'LeanSAT failed' | wc -l, expected 35, found 0, FAIL ran rg 'LeanSAT provided a counter' | wc -l, expected 8, found 0, FAIL ran rg 'Bitwuzla provided a counter' | wc -l, expected 8, found 0, FAIL ran rg 'LeanSAT proved' | wc -l, expected 7931, found 0, FAIL ran rg 'Bitwuzla proved' | wc -l, expected 7951, found 0, FAIL error (kernel) deep recursion detected was raised 25 times error The SAT solver timed out while solving the problem. was raised 37 times error The SMT solver timed out while solving the problem. was raised 15 times
@oluwatimilehin Thank you for working on the evaluation! Do you feel that the code is in a mergeable state now? If so, please do merge it, and work on the next improvement in a separate PR. Generally, long-running PRs become hard to merge and to review, so I'd suggest stopping this one at a logical point, merging it in, and then continuing work in another PR.
@oluwatimilehin aha, my bad! I meant that we should change the status to "ready for review", to then ask for review from one of us :)
@oluwatimilehin I will do a review now, post-commit, so we can cleanup the PR. In general, I think we should submit a PR that cleans up the implementation in the following ways:
- We make sure that all our functions are commented.
- We clearly explain each
sorry. - We remove all commented / dead code.
- We guard every
#evalor#generalizewithguard_msgs. May I have a PR titled "cleanup generalizer" that does this?