LeanDojo icon indicating copy to clipboard operation
LeanDojo copied to clipboard

DojoInitError: Cannot find the *.ast.json file for Theorem

Open HorHang opened this issue 9 months ago • 8 comments

While I am trying to use LeanDjo to interact with LeanDojo programatically, below issue occurred. I think I don't have the *.ast.json file. How to get those files?

Run Code:

from lean_dojo import *
from ReProver.common import *
URL = "https://github.com/leanprover-community/mathlib4"
COMMIT = "20c73142afa995ac9c8fb80a9bb585a55ca38308"
repo = LeanGitRepo(URL, COMMIT)
theorem = Theorem(repo, thm.theorem.file_path, thm.theorem.full_name)
with Dojo(theorem) as (dojo, init_state):
       dojo.run_tac()

Error:

During handling of the above exception, another exception occurred:

DojoInitError                             Traceback (most recent call last)
Cell In[15], [line 25](vscode-notebook-cell:?execution_count=15&line=25)
     [21](vscode-notebook-cell:?execution_count=15&line=21)             unproof_theorems.append((thm_name, id))
     [23](vscode-notebook-cell:?execution_count=15&line=23)     return proof_theorems, unproof_theorems
---> [25](vscode-notebook-cell:?execution_count=15&line=25) proof_theorems, unproof_theorems = verified_all_theorems(dojo_matching_theorems[116:200])

Cell In[15], [line 16](vscode-notebook-cell:?execution_count=15&line=16)
     [13](vscode-notebook-cell:?execution_count=15&line=13) proof_ls = split_proof(commented_proof, pf_pattern)
     [14](vscode-notebook-cell:?execution_count=15&line=14) proof_steps = get_single_step_proof(proof_ls, pattern, tac_pattern)
---> [16](vscode-notebook-cell:?execution_count=15&line=16) thm_name, proof_finished = verified_single_theorem(theorem, proof_steps)
     [18](vscode-notebook-cell:?execution_count=15&line=18) if proof_finished:
     [19](vscode-notebook-cell:?execution_count=15&line=19)     proof_theorems.append((thm_name, id))

Cell In[10], [line 6](vscode-notebook-cell:?execution_count=10&line=6)
      [5](vscode-notebook-cell:?execution_count=10&line=5) def verified_single_theorem(theorem: Theorem, proof_steps: List[str]):
----> [6](vscode-notebook-cell:?execution_count=10&line=6)     with Dojo(theorem) as (dojo, init_state):
      [7](vscode-notebook-cell:?execution_count=10&line=7)       result = init_state
      [8](vscode-notebook-cell:?execution_count=10&line=8)       for t in proof_steps:

File ~/miniconda3/envs/ReProver/lib/python3.11/site-packages/lean_dojo/interaction/dojo.py:165, in Dojo.__enter__(self)
    [163](https://file+.vscode-resource.vscode-cdn.net/home/user/~/miniconda3/envs/ReProver/lib/python3.11/site-packages/lean_dojo/interaction/dojo.py:163)     traced_file = self._locate_traced_file(traced_repo_path)
    [164](https://file+.vscode-resource.vscode-cdn.net/home/user/~/miniconda3/envs/ReProver/lib/python3.11/site-packages/lean_dojo/interaction/dojo.py:164) except FileNotFoundError:
--> [165](https://file+.vscode-resource.vscode-cdn.net/home/user/~/miniconda3/envs/ReProver/lib/python3.11/site-packages/lean_dojo/interaction/dojo.py:165)     raise DojoInitError(
    [166](https://file+.vscode-resource.vscode-cdn.net/home/user/~/miniconda3/envs/ReProver/lib/python3.11/site-packages/lean_dojo/interaction/dojo.py:166)         f"Cannot find the *.ast.json file for {self.entry} in {traced_repo_path}."
    [167](https://file+.vscode-resource.vscode-cdn.net/home/user/~/miniconda3/envs/ReProver/lib/python3.11/site-packages/lean_dojo/interaction/dojo.py:167)     )
    [169](https://file+.vscode-resource.vscode-cdn.net/home/user/~/miniconda3/envs/ReProver/lib/python3.11/site-packages/lean_dojo/interaction/dojo.py:169) self._modify_file(traced_file)
    [171](https://file+.vscode-resource.vscode-cdn.net/home/user/~/miniconda3/envs/ReProver/lib/python3.11/site-packages/lean_dojo/interaction/dojo.py:171) # Run the modified file in a container.

DojoInitError: Cannot find the *.ast.json file for Theorem(repo=LeanGitRepo(url='https://github.com/leanprover-community/mathlib4', commit='20c73142afa995ac9c8fb80a9bb585a55ca38308'), file_path=PosixPath('src/lean/Init/Data/List/Erase.lean'), full_name='List.eraseP_cons_of_pos') in projects/.cache/lean_dojo/leanprover-community-mathlib4-20c73142afa995ac9c8fb80a9bb585a55ca38308/mathlib4.

System Version:

Lake version 5.0.0-306f361 (Lean version 4.17.0)
Lean (version 4.17.0, x86_64-unknown-linux-gnu, commit 306f36116535, Release)
lean-dojo v2.2.0
Python 3.11.8

HorHang avatar Mar 07 '25 03:03 HorHang

Another thing, how could we dojo.run_tac() on multiple tactics or single line contain ;? Some seems to work while other doesn't.

Example:

  1. theorem Finset.map_nsmul_piAntidiag can run tactic \n classical rw [map_eq_image]; exact nsmul_piAntidiag _ _ hn but if we split based on ; and run_tac separately, it going to run into error.
  2. Conversely, theorem CommSemiring.toSemiring_injective cannot run tactic \n rintro ⟨⟩ ⟨⟩ _; congr. However, if we split it and run separately, it going to lead to Proof_Finished.
  3. For theorem dist_le_Ico_sum_dist, how to run multiple line tactics like below:
  induction n, h using Nat.le_induction with
  | base => rw [Finset.Ico_self, Finset.sum_empty, dist_self]
  | succ n hle ihn =>
    calc
      dist (f m) (f (n + 1)) ≤ dist (f m) (f n) + dist (f n) (f (n + 1)) := dist_triangle _ _ _
      _ ≤ (∑ i ∈ Finset.Ico m n, _) + _ := add_le_add ihn le_rfl
      _ = ∑ i ∈ Finset.Ico m (n + 1), _ := by
      { rw [Nat.Ico_succ_right_eq_insert_Ico hle, Finset.sum_insert, add_comm]; simp }

HorHang avatar Mar 07 '25 05:03 HorHang

The code example is not runnable. I can take a look if you have a minimal runnable example to reproduce the error.

Usually the "Cannot find the *.ast.json file" error is caused by repos that are not properly traced (e.g., some processes were killed due to out-of-memory error when tracing the repo). Delete the traced repo from the cache and re-trace it (with larger memory if possible) may help.

yangky11 avatar Mar 14 '25 02:03 yangky11

Re

Another thing, how could we dojo.run_tac() on multiple tactics or single line contain ;? Some seems to work while other doesn't.

Example:

  1. theorem Finset.map_nsmul_piAntidiag can run tactic \n classical rw [map_eq_image]; exact nsmul_piAntidiag _ _ hn but if we split based on ; and run_tac separately, it going to run into error.
  2. Conversely, theorem CommSemiring.toSemiring_injective cannot run tactic \n rintro ⟨⟩ ⟨⟩ _; congr. However, if we split it and run separately, it going to lead to Proof_Finished.

A concrete, runnable piece of code would be helpful!

  1. For theorem dist_le_Ico_sum_dist, how to run multiple line tactics like below:
  induction n, h using Nat.le_induction with
  | base => rw [Finset.Ico_self, Finset.sum_empty, dist_self]
  | succ n hle ihn =>
    calc
      dist (f m) (f (n + 1)) ≤ dist (f m) (f n) + dist (f n) (f (n + 1)) := dist_triangle _ _ _
      _ ≤ (∑ i ∈ Finset.Ico m n, _) + _ := add_le_add ihn le_rfl
      _ = ∑ i ∈ Finset.Ico m (n + 1), _ := by
      { rw [Nat.Ico_succ_right_eq_insert_Ico hle, Finset.sum_insert, add_comm]; simp }

This is a calculational proof in Lean (the calc keyword), which LeanDojo does not support.

yangky11 avatar Mar 14 '25 02:03 yangky11

The code example is not runnable. I can take a look if you have a minimal runnable example to reproduce the error.

Usually the "Cannot find the *.ast.json file" error is caused by repos that are not properly traced (e.g., some processes were killed due to out-of-memory error when tracing the repo). Delete the traced repo from the cache and re-trace it (with larger memory if possible) may help.

Thanks for your reply! Below is the code to reproduce.

from lean_dojo import *
from ReProver.common import *
URL = "https://github.com/leanprover-community/mathlib4"
COMMIT = "20c73142afa995ac9c8fb80a9bb585a55ca38308"
repo = LeanGitRepo(URL, COMMIT)
traced_repo = trace(repo)
thm = next(tt for tt in traced_repo.get_traced_theorems() if tt.theorem.full_name == "List.eraseP_cons_of_pos")
theorem = Theorem(repo, thm.theorem.file_path, thm.theorem.full_name)
with Dojo(theorem) as (dojo, init_state):
       dojo.run_tac(init_state, "simp [eraseP_cons, h]")

Initially, I suspected about the memory bottleneck as well. But the problem is still persisted even thought scaling to 220GB RAM.

Looking at the theorem name, it should belonged to lean 4 repo itself but not the target traced_repo.

HorHang avatar Mar 14 '25 04:03 HorHang

Re

Another thing, how could we dojo.run_tac() on multiple tactics or single line contain ;? Some seems to work while other doesn't. Example:

  1. theorem Finset.map_nsmul_piAntidiag can run tactic \n classical rw [map_eq_image]; exact nsmul_piAntidiag _ _ hn but if we split based on ; and run_tac separately, it going to run into error.
  2. Conversely, theorem CommSemiring.toSemiring_injective cannot run tactic \n rintro ⟨⟩ ⟨⟩ _; congr. However, if we split it and run separately, it going to lead to Proof_Finished.

A concrete, runnable piece of code would be helpful!

  1. For theorem dist_le_Ico_sum_dist, how to run multiple line tactics like below:
  induction n, h using Nat.le_induction with
  | base => rw [Finset.Ico_self, Finset.sum_empty, dist_self]
  | succ n hle ihn =>
    calc
      dist (f m) (f (n + 1)) ≤ dist (f m) (f n) + dist (f n) (f (n + 1)) := dist_triangle _ _ _
      _ ≤ (∑ i ∈ Finset.Ico m n, _) + _ := add_le_add ihn le_rfl
      _ = ∑ i ∈ Finset.Ico m (n + 1), _ := by
      { rw [Nat.Ico_succ_right_eq_insert_Ico hle, Finset.sum_insert, add_comm]; simp }

This is a calculational proof in Lean (the calc keyword), which LeanDojo does not support.

Below is concrete code of above two cases.

First:

from lean_dojo import *
from ReProver.common import *
URL = "https://github.com/leanprover-community/mathlib4"
COMMIT = "20c73142afa995ac9c8fb80a9bb585a55ca38308"
repo = LeanGitRepo(URL, COMMIT)
traced_repo = trace(repo)
thm = next(tt for tt in traced_repo.get_traced_theorems() if tt.theorem.full_name == "Finset.map_nsmul_piAntidiag")
theorem = Theorem(repo, thm.theorem.file_path, thm.theorem.full_name)
tac = [\n  classical rw [map_eq_image], exact nsmul_piAntidiag _ _ hn]
with Dojo(theorem) as (dojo, init_state):
       state = dojo.run_tac(init_state, tac[0])
       state = dojo.run_tac(state, tac[1])

Second:

thm = next(tt for tt in traced_repo.get_traced_theorems() if tt.theorem.full_name == "CommSemiring.toSemiring_injective")
theorem = Theorem(repo, thm.theorem.file_path, thm.theorem.full_name)
with Dojo(theorem) as (dojo, init_state):
       state = dojo.run_tac(init_state, "\n  rintro ⟨⟩ ⟨⟩ _; congr")

HorHang avatar Mar 14 '25 04:03 HorHang

LeanDojo cannot be used to interact with theorems from the Lean 4 repo itself. See https://leandojo.readthedocs.io/en/latest/limitations.html#limitations

yangky11 avatar Mar 14 '25 14:03 yangky11

The classical in classical rw [map_eq_image]; exact nsmul_piAntidiag _ _ hn is a scoping tactic (see https://github.com/haruhisa-enomoto/mathlib4-all-tactics/blob/main/all-tactics.md#classical). It's equivalent to classical (rw [map_eq_image]; exact nsmul_piAntidiag _ _ hn) and should decompose to classical rw [map_eq_image] and classical exact nsmul_piAntidiag _ _ hn

In [4]: dojo, init_state = Dojo(theorem).__enter__()
2025-03-14 17:56:51.803 | DEBUG    | lean_dojo.interaction.dojo:__enter__:241 - Initializing Dojo for Theorem(repo=LeanGitRepo(url='https://github.com/leanprover-community/mathlib4', commit='20c73142afa995ac9c8fb80a9bb585a55ca38308'), file_path=PosixPath('Mathlib/Algebra/Order/Antidiag/Pi.lean'), full_name='Finset.map_nsmul_piAntidiag')
2025-03-14 17:56:51.807 | DEBUG    | lean_dojo.data_extraction.trace:get_traced_repo_path:219 - The traced repo is available in the cache.
2025-03-14 17:56:51.965 | DEBUG    | lean_dojo.interaction.dojo:_modify_file:342 - Modifying `Mathlib/Algebra/Order/Antidiag/Pi.lean` into `/private/home/kaiyuy/.cache/lean_dojo/leanprover-community-mathlib4-20c73142afa995ac9c8fb80a9bb585a55ca38308/mathlib4/Mathlib/Algebra/Order/Antidiag/Piptgezrgv.lean`

In [5]: dojo.run_tac(init_state, "classical rw [map_eq_image]; exact nsmul_piAntidiag _ _ hn")
2025-03-14 17:57:19.177 | DEBUG    | lean_dojo.interaction.dojo:_submit_request:461 - {"sid": 0, "cmd": "classical rw [map_eq_image]; exact nsmul_piAntidiag _ _ hn"}
Out[5]: ProofFinished(tactic_state_id=1, message='')

In [6]: dojo.run_tac(init_state, "classical rw [map_eq_image]")
2025-03-14 17:58:41.056 | DEBUG    | lean_dojo.interaction.dojo:_submit_request:461 - {"sid": 0, "cmd": "classical rw [map_eq_image]"}
Out[6]: TacticState(pp="ι : Type u_1\ninst✝ : DecidableEq ι\ns : Finset ι\nm n : ℕ\nhn : n ≠ 0\n⊢ image (⇑{ toFun := fun x => n • x, inj' := ⋯ }) (s.piAntidiag m) =\n    filter (fun f => ∀ i ∈ s, n ∣ f i) (s.piAntidiag (n * m))", id=2, message='')

In [7]: s1 = _

In [8]: dojo.run_tac(s1, "exact nsmul_piAntidiag _ _ hn")
2025-03-14 17:59:12.129 | DEBUG    | lean_dojo.interaction.dojo:_submit_request:461 - {"sid": 2, "cmd": "exact nsmul_piAntidiag _ _ hn"}
Out[8]: LeanError(error='failed to synthesize\n  DecidableEq (ι → ℕ)\nAdditional diagnostic information may be available using the `set_option diagnostics true` command.')

In [9]: dojo.run_tac(s1, "classical exact nsmul_piAntidiag _ _ hn")
2025-03-14 17:59:21.942 | DEBUG    | lean_dojo.interaction.dojo:_submit_request:461 - {"sid": 2, "cmd": "classical exact nsmul_piAntidiag _ _ hn"}
Out[9]: ProofFinished(tactic_state_id=3, message='')


yangky11 avatar Mar 14 '25 18:03 yangky11

I'm not sure what's happening for the CommSemiring.toSemiring_injective example and will take a more detailed look when I have a chance (likely a few weeks later).

yangky11 avatar Mar 14 '25 18:03 yangky11