repl icon indicating copy to clipboard operation
repl copied to clipboard

add garbage collection option

Open RexWzh opened this issue 8 months ago • 18 comments

Add a record option to optimize memory usage. The reference-counting issue is fixed by @ohyeat as discussed here.

RexWzh avatar Apr 10 '25 20:04 RexWzh

What is this meant to do? This needs documentation!

kim-em avatar Apr 10 '25 23:04 kim-em

I just updated the readme. Setting gc = true will discard the environment after execution, useful for memory management.

RexWzh avatar Apr 11 '25 05:04 RexWzh

The code modifications for unsafe def repl are written by @ohyeat , replacing loops with tail recursion.

I've tested these changes using statements from MiniF2F. Here are the test samples:

{"cmd": "import Mathlib\nopen BigOperators Real Nat Topology Rat"}

-- tactic mode
{"cmd": theorem algebra_amgm_faxinrrp2msqrt2geq2mxm1div2x :\n    ∀ x > 0, 2 - Real.sqrt 2 ≥ 2 - x - 1 / (2 * x) := by sorry", "env": 0}

-- command mode
{"cmd": "theorem algebra_amgm_faxinrrp2msqrt2geq2mxm1div2x :\n    ∀ x > 0, 2 - Real.sqrt 2 ≥ 2 - x - 1 / (2 * x) := by", "env": 0}

-- gc for tactic mode
{"cmd": "theorem algebra_amgm_faxinrrp2msqrt2geq2mxm1div2x :\n    ∀ x > 0, 2 - Real.sqrt 2 ≥ 2 - x - 1 / (2 * x) := by sorry", "env": 0, "gc": true}

-- gc for command mode
{"cmd": "theorem algebra_amgm_faxinrrp2msqrt2geq2mxm1div2x :\n    ∀ x > 0, 2 - Real.sqrt 2 ≥ 2 - x - 1 / (2 * x) := by", "env": 0, "gc": true}

And the test results:

image

RexWzh avatar Apr 14 '25 11:04 RexWzh

we also want to avoid saving the proofSnapshots too

FrederickPu avatar Apr 15 '25 15:04 FrederickPu

The proofSnapshots are essential for tactic mode - without them, the proofState id would be invalid when running commands in the format {"tactic": "xxx", "proofState":xxx}.

ADD:

However, the command that creates the proofSnapshot can be garbage-collected. See the green and yellow lines in the diagram above. I will add more tests later.

RexWzh avatar Apr 15 '25 15:04 RexWzh

Just a question of aesthetics. I would prefer the option to be called discardState instead of gc. Or maybe we do the opposite and call this option recordState or saveState, with true as default value. Don't know which option is better ^^

Other than that, I think this is a useful feature :+1:

augustepoiroux avatar Apr 15 '25 15:04 augustepoiroux

Just a question of aesthetics. I would prefer the option to be called discardState instead of gc. Or maybe we do the opposite and call this option recordState or saveState, with true as default value. Don't know which option is better ^^

Other than that, I think this is a useful feature 👍

You mean discardCmd, right?

As for discardState, I think implementing something like deleteState might be more plausible. We would rarely create a state and discard it immediately.

RexWzh avatar Apr 15 '25 15:04 RexWzh

The proofSnapshots are essential for tactic mode - without them, the proofState id would be invalid when running commands in the format {"tactic": "xxx", "proofState":xxx}.

ADD:

However, the command that creates the proofSnapshot can be garbage-collected. See the green and yellow lines in the diagram above. I will add more tests later.

The whole point of garbage collecting a command is that you never interact with its state again. So you won't apply tactics on its proof state

FrederickPu avatar Apr 15 '25 16:04 FrederickPu

@augustepoiroux I think using discard would be sufficient since it only applies to command mode.

with true as default value

We'd better set false as default to be compatibile with previous versions. For example, users need to set discard=false when first running import Mathlib :)

RexWzh avatar Apr 15 '25 17:04 RexWzh

Nice :+1: I agree with you about keeping backward compatibiliy. Regarding

with true as default value

The full sentence is

Or maybe we do the opposite and call this option recordState or saveState, with true as default value.

It is just a change of point of view. Instead of discarding commands, we record them. My naming was bad, but the idea was essentially to replace gc by record and inverting all the values. I.e. gc=True <-> record=False. And in this case we would have record=True as the default value, which keeps the current behavior of the REPL ;)

augustepoiroux avatar Apr 15 '25 17:04 augustepoiroux

@FrederickPu

The proofSnapshots are stored in proofStates and can exist independently of their corresponding env commands.

https://github.com/leanprover-community/repl/blob/506a6f35d7198e2ed3dcffe8dbd5750ee3fb906f/REPL/Main.lean#L61-L74

For verification, you can check the test cases in this PR:

https://github.com/leanprover-community/repl/pull/83/files

RexWzh avatar Apr 15 '25 17:04 RexWzh

then there should be an option to discard proof states as well. For example if you are doing proof repair, then you don't every use the tactic thing

FrederickPu avatar Apr 15 '25 17:04 FrederickPu

For example if you are doing proof repair, then you don't every use the tactic thing

@FrederickPu I've implemented similar command mode features for tactic mode and am currently testing how these features reduce memory usage.

Also, note that setting record=false will discard states created by tactics, but not states created by sorry. For example, tactics like have : 1=1 := by sorry will still create additional states.

btw, deleting goals could disrupt the proofState id, so we should avoid that approach.

The whole point of garbage collecting a command is that you never interact with its state again.

You can still interact with the state even after garbage collecting the command. This could address the issue you posted earlier.

RexWzh avatar Apr 17 '25 18:04 RexWzh

@kim-em I think this PR is ready.

Setting record=False enables garbage collection of environment snapshots. This helps reduce memory usage from continuously increasing, although commands like {"cmd":"import xxx"} cannot be garbage collected.

Additionally, it can serve as a health check command. For example: {"cmd":"#check true", "record":false} can be used to verify if the REPL is alive without modifying the environment snapshots.

Thanks for your time and effort for reviewing these changes.

RexWzh avatar May 03 '25 23:05 RexWzh

@kim-em Do you think we could merge this development? The new record parameter helps a lot limiting memory use on a REPL.

desaxce avatar Jul 25 '25 10:07 desaxce

Can someone make a summary of what "record": false does exactly in each case? From what I understood, if the command contains sorries, then the proof states are recorded, and the command state is transitively recorded as well, right? Is there a way to not record a command state, regardless of whether it contains sorries?

augustepoiroux avatar Jul 25 '25 13:07 augustepoiroux

In the following example

{ "cmd": "import Mathlib" }
{"env": 0}
{ "cmd": "theorem womp : 2 + 2 = 4 := sorry", "env": 1}
{ "sorries" : [ {"proofState": 0, ....}] }

The proofState=0 is defined entirely in terms of the context env=0 so env=1 can be discarded entirely. I think that was @RexWzh 's idea. However, if the command for env=1 created new declarations I don't think this intuition would hold.

The only state that is transitively saved in the proofSnapshot is the Enviroment through Meta.Context or something like that. Ie all of the parsed information from the Command.json won't be included only the context necessary for creating the syntax of the tactic state for the proofsnapshot.

FrederickPu avatar Jul 25 '25 14:07 FrederickPu

I think it would be nice to have a way of discarding proofsnapshots once they are no longer being used. Ie you've finished with your proof search for a particular problem.

FrederickPu avatar Jul 25 '25 15:07 FrederickPu