lean-mlir icon indicating copy to clipboard operation
lean-mlir copied to clipboard

feat: Blase NS multi-machine batch

Open bollu opened this issue 3 months ago • 18 comments

We will use snakemake's kubernetes executor: https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/kubernetes.html

along with namespace's kubernetes cluster support:

  • https://namespace.so/docs/reference/github-actions/nscloud-cluster-action

to run our snakemake setup directly on snakemake. The open question for me is where this will store the build artifact. Currently, our evaluation run uploads a ZIP file to Zulip. We can upload this ZIP file as a build artifact on GH.

bollu avatar Sep 22 '25 14:09 bollu

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Sep 22 '25 14:09 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Sep 24 '25 07:09 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Sep 24 '25 07:09 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Sep 24 '25 07:09 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Sep 24 '25 08:09 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Sep 24 '25 08:09 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Sep 24 '25 08:09 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Sep 24 '25 09:09 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Sep 24 '25 09:09 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Sep 24 '25 09:09 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Sep 26 '25 09:09 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Sep 26 '25 09:09 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Sep 26 '25 09:09 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Sep 26 '25 09:09 github-actions[bot]

!blase

bollu avatar Oct 07 '25 22:10 bollu

!blase

bollu avatar Oct 07 '25 22:10 bollu

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Oct 07 '25 22:10 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

github-actions[bot] avatar Oct 07 '25 23:10 github-actions[bot]