feat: snakemakeify hacker's delight
This shows how we would use snakemake to gradually migrate our plotting pipeline for hacker's delight.
See that snakemake now subsumes the (hard) job of task running in collect.py.
It leaves the CSV wranging in collect.py and plot.py untouched, as these have been vetted by @luisacicolini .
This provides parallelism at the task level, and I would like to believe, is much more declarative on how the hacker's delight files are built.
How would the run_with_limits function be used?
It would be cool if we could hook it up with snakemake's notion of limits
@ineol I added an example. Sadly, I learnt today that snakemake's notion of limits are not imposed by the runner, but are only used by the scheduler :) Regardless, we can query those parameters and use then in our call.
@ineol , if I could get a thumbs up from you for the config, that would be great :)
What does the python thing do in the first rule? Does it update the venv when the requirements.txt change or just creates it if it does not exist?
Relatedly, can we use pip to install snakemake as well?
@ineol Indeed, we can use pip, so I've added it into the requirements.txt.
NIT: I'd expect some clean target to get rid of produced plots (although I'm not sure if we'd want it to clean the built core oleans or not)
name 'run_with_limits' is not defined
:cry:
In general, I'm not a big fan of the fact that when something breaks, it vomits up heaps of non-informative "thing broke because something returned a non-zero status code" logs. Just show me the actual error!
Just look at the signal/noise ratio in the following log!
NameError in file "/home/alex/Workspace/PhD/lean-mlir/bv-evaluation/Snakefile", line 57:
name 'run_with_limits' is not defined
File "/home/alex/Workspace/PhD/lean-mlir/bv-evaluation/Snakefile", line 57, in __rule_hdel_compare_make_output
Exiting because a job execution failed. Look below for error messages
WorkflowError:
At least one job did not complete successfully.
Select jobs to execute...
Traceback (most recent call last):
File "/home/alex/Workspace/PhD/lean-mlir/.venv/lib/python3.13/site-packages/snakemake/executors/local.py", line 232, in spawn_job
subprocess.check_call(cmd, shell=True)
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/nix/store/sd81bvmch7njdpwx3lkjslixcbj5mivz-python3-3.13.4/lib/python3.13/subprocess.py", line 419, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cd /home/alex/Workspace/PhD/lean-mlir/bv-evaluation && /home/alex/Workspace/PhD/lean-mlir/.venv/bin/python3 -m snakemake --snakefile '/home/alex/Workspace/PhD/lean-mlir/bv-evaluation/Snakefile' --target-jobs 'hdel_compare_make_output:file=ch2_2AdditionAndLogicalOps,width=64,r=1' --allowed-rules hdel_compare_make_output --cores 1 --attempt 1 --force-use-threads --force --target-files-omit-workdir-adjustment --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --rerun-triggers software-env mtime input params code --conda-frontend 'conda' --shared-fs-usage sources persistence storage-local-copies software-deployment input-output source-cache --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --latency-wait 5 --scheduler 'greedy' --local-storage-prefix base64//LnNuYWtlbWFrZS9zdG9yYWdl --scheduler-solver-path '/home/alex/Workspace/PhD/lean-mlir/.venv/bin' --default-resources base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= --quiet progress rules host --mode 'subprocess' --local-groupid 'local'' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/nix/store/sd81bvmch7njdpwx3lkjslixcbj5mivz-python3-3.13.4/lib/python3.13/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/alex/Workspace/PhD/lean-mlir/.venv/lib/python3.13/site-packages/snakemake/executors/local.py", line 247, in cached_or_run
run_func(*args)
~~~~~~~~^^^^^^^
File "/home/alex/Workspace/PhD/lean-mlir/.venv/lib/python3.13/site-packages/snakemake/executors/local.py", line 234, in spawn_job
raise SpawnedJobError()
snakemake.exceptions.SpawnedJobError
[Tue Aug 26 09:42:02 2025]
Error in rule hdel_compare_make_output:
message: None
jobid: 17
input: /home/alex/Workspace/PhD/lean-mlir/bv-evaluation/results/HackersDelight/ch2_2AdditionAndLogicalOps_64_2.lean
output: /home/alex/Workspace/PhD/lean-mlir/bv-evaluation/results/HackersDelight/ch2_2AdditionAndLogicalOps_64_r1.txt
Shutting down, this might take some time.
Exiting because a job execution failed. Look below for error messages
[Tue Aug 26 09:42:02 2025]
Error in rule hdel_compare_make_output:
message: None
jobid: 17
input: /home/alex/Workspace/PhD/lean-mlir/bv-evaluation/results/HackersDelight/ch2_2AdditionAndLogicalOps_64_2.lean
output: /home/alex/Workspace/PhD/lean-mlir/bv-evaluation/results/HackersDelight/ch2_2AdditionAndLogicalOps_64_r1.txt
Complete log(s): /home/alex/Workspace/PhD/lean-mlir/bv-evaluation/.snakemake/log/2025-08-26T094200.842934.snakemake.log
WorkflowError:
At least one job did not complete successfully.
@alexkeizer @ineol @luisacicolini , I would appreciate it if you could try this again. Please run
cd lean-mlir/bv-evaluation;./toplevel-hackersdelight.sh
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
I tried to install bitwuzla but it could not download gmp because ftp.gnu.org sems dead. Bitwuzla is an implicit dependency of the snakemake file.
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
Also @bollu could you double check what metadata you generate, and add it to .gitignore. I got a bunch of stuff under .snakemake, and also __pycache__ (but I'm not sure how much of that was us trying to get Nix to behave)
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
While running inside the
nixdev-shell, it dies with the following error:Traceback (most recent call last): File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/cli.py", line 2165, in args_to_api dag_api.execute_workflow( File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/api.py", line 603, in execute_workflow workflow.execute( File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/workflow.py", line 1405, in execute raise e File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/workflow.py", line 1401, in execute success = self.scheduler.schedule() ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/scheduler.py", line 356, in schedule raise e File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/scheduler.py", line 200, in schedule self._finish_jobs() File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/scheduler.py", line 432, in _finish_jobs async_run(postprocess()) File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/common/__init__.py", line 99, in async_run return asyncio.run(coroutine) ^^^^^^^^^^^^^^^^^^^^^^ File "/home/alex/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/home/alex/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/alex/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/scheduler.py", line 375, in postprocess await job.postprocess( File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/jobs.py", line 1260, in postprocess await self.dag.workflow.persistence.finished(self) File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/persistence.py", line 317, in finished params = self._params(job) ^^^^^^^^^^^^^^^^^ File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/persistence.py", line 631, in _params return sorted( ^^^^^^^ File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/persistence.py", line 634, in <genexpr> (self._serialize_param(value) for value in job.non_derived_params), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/persistence.py", line 623, in _serialize_param_pandas import pandas as pd File "/nix/store/27drr45ziv3kbjl4pdzdh1cic7ncvpp2-python3-3.13.4-env/lib/python3.13/site-packages/pandas/__init__.py", line 19, in <module> raise ImportError( ImportError: Unable to import required dependencies: numpy: Error importing numpy: you should not try to import numpy from its source directory; please exit the numpy source tree, and relaunch your python interpreter from there.
This was a nix-skill-issue: the posted error is really saying that numpy failed to load the C extension, which is presumably caused by nix being nix. The fix was to ensure that pythonPackages312.numpy is installed in flake.nix (note that other dependencies are fine to be uv-installed, it's only this one that is problematic!
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
It seems the mathlib cache is download twice for me:
Using cache from origin: leanprover-community/mathlib4-nightly-testing
Attempting to download 7045 file(s) from leanprover-community/mathlib4 cache
Downloaded: 0 file(s) [attempted 7045/7045 = 100%] (0% success)
Attempting to download 7045 file(s) from leanprover-community/mathlib4-nightly-testing cache
Downloaded: 2162 file(s) [attempted 2162/7045 = 30%]
bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%
It seems the mathlib cache is download twice for me:
Using cache from origin: leanprover-community/mathlib4-nightly-testing Attempting to download 7045 file(s) from leanprover-community/mathlib4 cache Downloaded: 0 file(s) [attempted 7045/7045 = 100%] (0% success) Attempting to download 7045 file(s) from leanprover-community/mathlib4-nightly-testing cache Downloaded: 2162 file(s) [attempted 2162/7045 = 30%]
I merged the latest origin/main. Does this help?
After tweaking flake.nix to install numpy for the specific python version that snakemake wants, this now works on my side! LGTM
@tobiasgrosser I have an LGTM from Alex, so I propose to merge this.