lean-mlir feat: snakemakeify hacker's delight

This shows how we would use snakemake to gradually migrate our plotting pipeline for hacker's delight.

See that snakemake now subsumes the (hard) job of task running in collect.py. It leaves the CSV wranging in collect.py and plot.py untouched, as these have been vetted by @luisacicolini .

This provides parallelism at the task level, and I would like to believe, is much more declarative on how the hacker's delight files are built.

Aug 24 '25 12:08 bollu

How would the run_with_limits function be used?

Aug 24 '25 14:08 ineol

It would be cool if we could hook it up with snakemake's notion of limits

Aug 24 '25 14:08 ineol

@ineol I added an example. Sadly, I learnt today that snakemake's notion of limits are not imposed by the runner, but are only used by the scheduler :) Regardless, we can query those parameters and use then in our call.

Aug 24 '25 17:08 bollu

@ineol , if I could get a thumbs up from you for the config, that would be great :)

Aug 25 '25 09:08 bollu

What does the python thing do in the first rule? Does it update the venv when the requirements.txt change or just creates it if it does not exist?

Relatedly, can we use pip to install snakemake as well?

Aug 25 '25 11:08 ineol

@ineol Indeed, we can use pip, so I've added it into the requirements.txt.

Aug 26 '25 08:08 bollu

NIT: I'd expect some clean target to get rid of produced plots (although I'm not sure if we'd want it to clean the built core oleans or not)

Aug 26 '25 08:08 alexkeizer

name 'run_with_limits' is not defined

:cry:

Aug 26 '25 08:08 alexkeizer

In general, I'm not a big fan of the fact that when something breaks, it vomits up heaps of non-informative "thing broke because something returned a non-zero status code" logs. Just show me the actual error!

Just look at the signal/noise ratio in the following log!

NameError in file "/home/alex/Workspace/PhD/lean-mlir/bv-evaluation/Snakefile", line 57:
name 'run_with_limits' is not defined
  File "/home/alex/Workspace/PhD/lean-mlir/bv-evaluation/Snakefile", line 57, in __rule_hdel_compare_make_output
Exiting because a job execution failed. Look below for error messages
WorkflowError:
At least one job did not complete successfully.
Select jobs to execute...
Traceback (most recent call last):

  File "/home/alex/Workspace/PhD/lean-mlir/.venv/lib/python3.13/site-packages/snakemake/executors/local.py", line 232, in spawn_job
    subprocess.check_call(cmd, shell=True)
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/nix/store/sd81bvmch7njdpwx3lkjslixcbj5mivz-python3-3.13.4/lib/python3.13/subprocess.py", line 419, in check_call
    raise CalledProcessError(retcode, cmd)

subprocess.CalledProcessError: Command 'cd /home/alex/Workspace/PhD/lean-mlir/bv-evaluation && /home/alex/Workspace/PhD/lean-mlir/.venv/bin/python3 -m snakemake --snakefile '/home/alex/Workspace/PhD/lean-mlir/bv-evaluation/Snakefile' --target-jobs 'hdel_compare_make_output:file=ch2_2AdditionAndLogicalOps,width=64,r=1' --allowed-rules hdel_compare_make_output --cores 1 --attempt 1 --force-use-threads  --force --target-files-omit-workdir-adjustment --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --rerun-triggers software-env mtime input params code --conda-frontend 'conda' --shared-fs-usage sources persistence storage-local-copies software-deployment input-output source-cache --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --latency-wait 5 --scheduler 'greedy' --local-storage-prefix base64//LnNuYWtlbWFrZS9zdG9yYWdl --scheduler-solver-path '/home/alex/Workspace/PhD/lean-mlir/.venv/bin' --default-resources base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= --quiet progress rules host --mode 'subprocess' --local-groupid 'local'' returned non-zero exit status 1.


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/nix/store/sd81bvmch7njdpwx3lkjslixcbj5mivz-python3-3.13.4/lib/python3.13/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)

  File "/home/alex/Workspace/PhD/lean-mlir/.venv/lib/python3.13/site-packages/snakemake/executors/local.py", line 247, in cached_or_run
    run_func(*args)
    ~~~~~~~~^^^^^^^

  File "/home/alex/Workspace/PhD/lean-mlir/.venv/lib/python3.13/site-packages/snakemake/executors/local.py", line 234, in spawn_job
    raise SpawnedJobError()

snakemake.exceptions.SpawnedJobError

[Tue Aug 26 09:42:02 2025]
Error in rule hdel_compare_make_output:
    message: None
    jobid: 17
    input: /home/alex/Workspace/PhD/lean-mlir/bv-evaluation/results/HackersDelight/ch2_2AdditionAndLogicalOps_64_2.lean
    output: /home/alex/Workspace/PhD/lean-mlir/bv-evaluation/results/HackersDelight/ch2_2AdditionAndLogicalOps_64_r1.txt
Shutting down, this might take some time.
Exiting because a job execution failed. Look below for error messages
[Tue Aug 26 09:42:02 2025]
Error in rule hdel_compare_make_output:
    message: None
    jobid: 17
    input: /home/alex/Workspace/PhD/lean-mlir/bv-evaluation/results/HackersDelight/ch2_2AdditionAndLogicalOps_64_2.lean
    output: /home/alex/Workspace/PhD/lean-mlir/bv-evaluation/results/HackersDelight/ch2_2AdditionAndLogicalOps_64_r1.txt
Complete log(s): /home/alex/Workspace/PhD/lean-mlir/bv-evaluation/.snakemake/log/2025-08-26T094200.842934.snakemake.log
WorkflowError:
At least one job did not complete successfully.

Aug 26 '25 08:08 alexkeizer

@alexkeizer @ineol @luisacicolini , I would appreciate it if you could try this again. Please run

cd lean-mlir/bv-evaluation;./toplevel-hackersdelight.sh

Aug 27 '25 08:08 bollu

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

Aug 27 '25 09:08 github-actions[bot]

I tried to install bitwuzla but it could not download gmp because ftp.gnu.org sems dead. Bitwuzla is an implicit dependency of the snakemake file.

Aug 27 '25 09:08 ineol

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

Aug 27 '25 09:08 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

Aug 27 '25 09:08 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

Aug 27 '25 09:08 github-actions[bot]

Also @bollu could you double check what metadata you generate, and add it to .gitignore. I got a bunch of stuff under .snakemake, and also __pycache__ (but I'm not sure how much of that was us trying to get Nix to behave)

Aug 27 '25 09:08 alexkeizer

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

Aug 27 '25 14:08 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

Aug 27 '25 14:08 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

Aug 27 '25 14:08 github-actions[bot]

While running inside the nix dev-shell, it dies with the following error:

Traceback (most recent call last):

  File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/cli.py", line 2165, in args_to_api
    dag_api.execute_workflow(

  File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/api.py", line 603, in execute_workflow
    workflow.execute(

  File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/workflow.py", line 1405, in execute
    raise e

  File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/workflow.py", line 1401, in execute
    success = self.scheduler.schedule()
              ^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/scheduler.py", line 356, in schedule
    raise e

  File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/scheduler.py", line 200, in schedule
    self._finish_jobs()

  File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/scheduler.py", line 432, in _finish_jobs
    async_run(postprocess())

  File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/common/__init__.py", line 99, in async_run
    return asyncio.run(coroutine)
           ^^^^^^^^^^^^^^^^^^^^^^

  File "/home/alex/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^

  File "/home/alex/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/alex/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^

  File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/scheduler.py", line 375, in postprocess
    await job.postprocess(

  File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/jobs.py", line 1260, in postprocess
    await self.dag.workflow.persistence.finished(self)

  File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/persistence.py", line 317, in finished
    params = self._params(job)
             ^^^^^^^^^^^^^^^^^

  File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/persistence.py", line 631, in _params
    return sorted(
           ^^^^^^^

  File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/persistence.py", line 634, in <genexpr>
    (self._serialize_param(value) for value in job.non_derived_params),
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/alex/Workspace/PhD/lean-mlir-alt/bv-evaluation/.venv/lib/python3.12/site-packages/snakemake/persistence.py", line 623, in _serialize_param_pandas
    import pandas as pd

  File "/nix/store/27drr45ziv3kbjl4pdzdh1cic7ncvpp2-python3-3.13.4-env/lib/python3.13/site-packages/pandas/__init__.py", line 19, in <module>
    raise ImportError(

ImportError: Unable to import required dependencies:
numpy: Error importing numpy: you should not try to import numpy from
        its source directory; please exit the numpy source tree, and relaunch
        your python interpreter from there.

This was a nix-skill-issue: the posted error is really saying that numpy failed to load the C extension, which is presumably caused by nix being nix. The fix was to ensure that pythonPackages312.numpy is installed in flake.nix (note that other dependencies are fine to be uv-installed, it's only this one that is problematic!

Aug 27 '25 18:08 alexkeizer

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

Aug 27 '25 19:08 github-actions[bot]

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

Aug 27 '25 19:08 github-actions[bot]

It seems the mathlib cache is download twice for me:

Using cache from origin: leanprover-community/mathlib4-nightly-testing
Attempting to download 7045 file(s) from leanprover-community/mathlib4 cache
Downloaded: 0 file(s) [attempted 7045/7045 = 100%] (0% success)
Attempting to download 7045 file(s) from leanprover-community/mathlib4-nightly-testing cache
Downloaded: 2162 file(s) [attempted 2162/7045 = 30%]

Aug 27 '25 19:08 tobiasgrosser

bv_decide solved 0 theorems. bitwuzla solved 0 theorems. bv_decide found 0 counterexamples. bitwuzla found 0 counterexamples. bv_decide only failed on 0 problems. bitwuzla only failed on 0 problems. both bitwuzla and bv_decide failed on 0 problems. In total, bitwuzla saw 0 problems. In total, bv_decide saw 0 problems. ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS The InstCombine benchmark contains 4520 theorems in total. Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan mean of percentage stddev/av: nan%

Aug 27 '25 23:08 github-actions[bot]

It seems the mathlib cache is download twice for me:

Using cache from origin: leanprover-community/mathlib4-nightly-testing
Attempting to download 7045 file(s) from leanprover-community/mathlib4 cache
Downloaded: 0 file(s) [attempted 7045/7045 = 100%] (0% success)
Attempting to download 7045 file(s) from leanprover-community/mathlib4-nightly-testing cache
Downloaded: 2162 file(s) [attempted 2162/7045 = 30%]

I merged the latest origin/main. Does this help?

Aug 27 '25 23:08 bollu

After tweaking flake.nix to install numpy for the specific python version that snakemake wants, this now works on my side! LGTM

Aug 28 '25 11:08 alexkeizer

@tobiasgrosser I have an LGTM from Alex, so I propose to merge this.

Aug 28 '25 13:08 bollu