lean4 icon indicating copy to clipboard operation
lean4 copied to clipboard

Watchdog sometimes gets stuck and stops processing requests for a file

Open ammkrn opened this issue 2 years ago • 11 comments

Occasionally, the extension will stop providing feedback for a given file, with the problem persisting after restarting the extension/vscode, and after removing build artifacts. The behavior goes away if I change the file name, and returns if I revert the file to the original name. I had a similar experience today, where it initially provides incorrect error/line numbers and then stops providing feedback.

I'm not sure how to trigger this or debug it through vscode so I can't provide a MWE, but this sounds like a cache issue, I can take a look if you point me in the right direction.

MacOS 11.6.6 vscode 1.67.2 extension v0.0.77 nightly-2022-05-31

ammkrn avatar Jun 11 '22 14:06 ammkrn

I'm also seeing this issue in the neovim plugin, so maybe this is a bug in the Lean 4 LSP implementation itself. Does the Lean 4: Restart Server command help?

gebner avatar Jun 14 '22 19:06 gebner

It does not. The only thing that seems to work is renaming the file.

ammkrn avatar Jun 14 '22 22:06 ammkrn

This looks like a bug in the watchdog. Transferring to the Lean 4 repo because this bug is not editor-specific.

gebner avatar Jun 15 '22 07:06 gebner

I'm going out on a limb here, but this could be due to this line: https://github.com/leanprover/lean4/blob/5896e6f1d6fc69441b476f3457172713e708f835/src/Lean/Server/Watchdog.lean#L308

If writing the exit message to the file worker blocks, then the file worker will never be erased. In particular it won't be restarted with didOpen, so the restart server command has no effect.

gebner avatar Jun 15 '22 07:06 gebner

I'm not sure I've encountered this myself so far; I get plenty of hangs, but they usually seem to be Emacs' fault.

We took care to make sure that the main loop of the file worker never blocks except on reading requests. Would be interesting to get stack traces from such a hung worker.

Kha avatar Jun 15 '22 09:06 Kha

But just to be clear, should "Restart Server" not replace the entire watchdog process?

Kha avatar Jun 15 '22 09:06 Kha

But just to be clear, should "Restart Server" not replace the entire watchdog process?

Oh, I confused that with "Refresh File Dependencies". Maybe something different is going on as well then.

We took care to make sure that the main loop of the file worker never blocks except on reading requests. Would be interesting to get stack traces from such a hung worker.

:+1: about the stack traces. We do all kinds of (potentially) blocking operations in the main loop though: printing to stdout/stderr, handle LSP requests (most handlers spawn a task for some of the work, but this is not enforced), there's also lots of potentially contended accesses to IO.Refs which could cause busy waiting.

gebner avatar Jun 15 '22 09:06 gebner

This issue has been plaguing me for a while in VSCode and is very unfortunate, It kills all the nice interactive features that are major positives of Lean.

However, I do believe I have figured out a surefire way to exit this state once in it for a file in VSCode. Close all open Lean tabs and then execute Developer: Reload Window. Once VSCode has refreshed you should be able to reopen the file and see it working as normal. Note that all Lean tabs must be closed before reloading VSCode. This is why, in my experience, simply restarting VSCode does not fix the issue -- it keeps the tabs open upon restart and thereby preserves the breakage.

tydeu avatar Jul 29 '22 20:07 tydeu

I'm seeing the same thing reported here https://github.com/leanprover/lean4/issues/1564 and I suspect the repro always involves editing through some invalid states. If I load good code and simply browse around, then everything is fine. So the scenario is I'm writing code, getting errors, and then I fix the code, lake build succeeds from the command line, but goto definition and hover tips are gone. And I can see goto definition working in other files, but not on the files I've been editing. So yes "watchdog stuck" is a great description. Refresh file dependencies did not fix it for me also. I didn't try restart server, I will try that next time, but close and restart vscode does fix it (except perhaps in the case Mac mentions above where VS code remembers which files you were editing and quickly goes right back into the stuck state).

lovettchris avatar Sep 08 '22 00:09 lovettchris

We probably need to write some async state machine proofs in the LSP, I would love to do that it would be very fun and would be related to a previous project I worked on called Coyote. The best way to handle async state machines is with an inbox actor model to ensure serialization of messages across the various async actors involved in the LSP implementation, that should include actors for IO, according to Gabriel's comments above.

lovettchris avatar Sep 30 '22 02:09 lovettchris

I can consistently reproduce this on my Windows machine, but I didn't manage to reproduce it yesterday on my Linux machine (both in VSCode).

Here is some behavior that might suggest possible problems:

  • Suppose I create a file by clicking in the Explorer Left Side Bar -> New File... -> Name the file Test1.lean. If I then edit the file, the server will update as expected.
  • Suppose I create an empty file with ctrl + N and then save (ctrl + S) it as Test2.lean the server will not update as expected. It will not react to keystrokes, and only update once whenever I run Restart File Dependencies / Restart File / Restart Server.
  • As mentioned above, renaming the file or Developer: Reload Window (without open Lean files) will fix the issue.
  • There is some weird behavior when renaming a file back to the original file name. Suppose I:
    • write #eval 1+1 in Test2.lean (from step 2)
    • Refresh File Dependencies -> the server updates once
    • save the file
    • rename it to Test3.lean
    • add a couple more lines all saying #eval 1+1 -> the server will update on every change as expected
    • save the file
    • rename the file back to Test2.lean -> I get the output only on the first #eval
    • Refresh File Dependencies -> I get the output on all #eval statements
    • Restart Server -> I get the output only on the first #eval. It seems that the old state is stored somewhere and not updated.
  • Another case of weird behavior: Suppose I delete Test1.lean (from step 1), create an empty file with ctrl + N and then save (ctrl + S) it as Test1.lean, then the server updates on every change as expected. Note that this is exactly the same steps as earlier to reach the broken state, but the fact that there previously was a file called Test1.lean where the server updated is enough for it to work.

Hopefully this helps. I can share a screen recording of me performing these steps if that is useful.

fpvandoorn avatar Sep 30 '22 11:09 fpvandoorn