chore: drop custom bitvector notation
This PR removes the custom notation we had for signed & unsigned comparisons of bitvectors.
We still need to regenerate the extracted goals
This looks like a great idea. But why do we have so much snakemake here?
This looks like a great idea. But why do we have so much snakemake here?
Because I picked the wrong place to start my branch, it's now rebased
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /code/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /code/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
This breaks CI, but is fine for me. As I am currently on leave, maybe @bollu can review this. This patch seems to go in a direction he is also interested in.
Yes, happy to help here @alexkeizer , as this would help me create cleaner goal states as well
@bollu With our fix this is now building properly, could you run the extraction scripts and update the goals files to make the extract goals CI happy?
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /code/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%