dune icon indicating copy to clipboard operation
dune copied to clipboard

Dune crashes when editing a file in exec watch mode

Open LAC-Tech opened this issue 1 year ago • 21 comments

Expected Behavior

Dune not to crash when editing data.ml

Actual Behavior

Dune crashes when data.ml is edited, even if you just save the file without changing anything, with this output:

Error: _build/default/.hello.eobjs/dune__exe__Hello.impl.d: No such file or
directory
-> required by _build/default/.hello.eobjs/dune__exe__Hello.impl.all-deps
-> required by _build/default/hello.exe
Had 1 error, waiting for filesystem changes...      

Reproduction

clone https://github.com/LAC-Tech/dunebug run

dune exec -w ./hello.exe

Save file data.ml in your editor. (saving hello.ml works fine). You may sometimes have to save 2 or 3 times to get it to crash.

Specifications

dune 3.16.0 The OCaml toplevel, version 5.2.0 Arch Linux, kernel 6.10.6, x86_64

LAC-Tech avatar Oct 13 '24 20:10 LAC-Tech

This might be related to #10959.

maiste avatar Oct 17 '24 13:10 maiste

@LAC-Tech could you try again and tell me if it is working with the latest version of dune? It seems to have been fixed with #10960. I can't reproduce with your example again using main version of dune.

maiste avatar Oct 22 '24 10:10 maiste

Hi, excuse the late reply and basic question - but what's the best way to get this main dune version? build myself, install with opam flag? Just point me in the right direction.

LAC-Tech avatar Nov 01 '24 22:11 LAC-Tech

You can either build it yourself or use the binary version from https://preview.dune.build (nightly version of dune built from the main branch).

maiste avatar Nov 02 '24 13:11 maiste

Thanks. Exact same issue I'm afraid.

$ ~/.dune/bin/dune --version
"Dune Developer Preview: build 2024-11-02T02:13:00Z, git revision
c2e0160fd0b593c63536c7081238da1da5140720"

Then running with: ~/.dune/bin/dune exec -w ./hello.exe, and saving data.ml (which seems to be what's causing the race condition), I get the same error message.

Tried with a plain neovim, default config no plugins incase an LSP was causing an issue. But no luck.

Is there any information I can provide that might help?

LAC-Tech avatar Nov 03 '24 00:11 LAC-Tech

Thanks for trying! I was betting on the fact that the mentioned PR introduced delays in computation that could possibly solve the issue. Apparently not.

You already provided a reproduction case so we don't need extra information.

maiste avatar Nov 03 '24 10:11 maiste

I am unable to reproduce this. Is anybody else able to reproduce the issue?

Alizter avatar Apr 09 '25 18:04 Alizter

Greetings future people trawling obscure github issues trying to solve their problem!

I have solved the issue, it was actually a neovim issue. Something about the way neovim (and I believe vim) saves files interacts very poorly with hot reloading (I had this same issue with bun).

Turns out you need to add this to your init.lua:

vim.opt.backupcopy = "yes"

I'm reliably informed by an LLM that it's the following in vim script, but I cannot confirm:

set backupcopy=yes

So this was my mistake, I should have tried another editor. But on the plus side, this should help other n(vim) users.

LAC-Tech avatar Apr 17 '25 09:04 LAC-Tech

Thanks for providing the feedback and the notes for future people! I'm closing as completed 👍

maiste avatar Apr 17 '25 12:04 maiste

I'm still curious why this made Dune crash however.

Alizter avatar Apr 17 '25 12:04 Alizter

Maybe you could clear something up. Was it a crash or an error that you were experiencing?

Alizter avatar Apr 17 '25 12:04 Alizter

@Alizter it was an error I think, perhaps crash was a poor choice of words. The process did not terminate, but was stuck in a broken state while in watch mode.

LAC-Tech avatar Apr 18 '25 07:04 LAC-Tech

Thanks, but I suppose it was able to recover once you edited the file again. I'm really just interested in knowing if dune was able to recover from this.

Alizter avatar Apr 18 '25 08:04 Alizter

It was not able to recover, no.

You can confirm for yourself with a default neovim

LAC-Tech avatar Apr 19 '25 01:04 LAC-Tech

Excellent! I was able to reproduce this. We definitely have something racy going on here and Dune misses a dependency. I will investigate further into what exactly neovim is doing and try to reproduce this in a cram test.

Alizter avatar Apr 19 '25 09:04 Alizter

Here is a repro case using nvim:


  $ DONE_FLAG=_build/done_flag

  $ cat > dune-project <<EOF
  > (lang dune 3.18)
  > EOF

  $ cat > dune <<EOF
  > (executable
  >  (name foo)
  >  (libraries unix))
  > EOF

  $ cat > touch.ml <<EOF
  > let touch path =
  >  let fd = Unix.openfile path [ Unix.O_CREAT ] 0o777 in
  >  Unix.close fd
  > ;;
  > EOF
 
  $ cat > foo.ml <<EOF
  > let () =
  >   Touch.touch "$DONE_FLAG"
  > ;;
  > EOF

  $ dune exec -w -- ./foo.exe &
  Success, waiting for filesystem changes...
  Error: _build/default/.foo.eobjs/dune__exe__Foo.impl.d: No such file or
  directory
  -> required by _build/default/.foo.eobjs/dune__exe__Foo.impl.all-deps
  -> required by _build/default/foo.exe
  Had 1 error, waiting for filesystem changes...
  $ PID=$!
  $ ./wait-for-file.sh $DONE_FLAG

  $ nvim --headless touch.ml -c 'write'   \
  > -c 'quit' 2> /dev/null
  $ kill $PID

this uses the watching stuff in test-cases/exec-watch.

Strange difference with exec and build here is that dune build -w doesn't run into this problem, it manages to build ok. dune exec on the other hand, including my implementation in #11621 suffers from this issue. I suspect it is something to do with the file watcher. I will investigate further.

Alizter avatar Apr 19 '25 10:04 Alizter

Some progress today. I've managed to reproduce the issue with dune without neovim by using inotifywait and observing exactly what nvim (and probably vim) is doing. It's good that vim users can fix this behaviour, but dune should be more resilient in these situations and should be able to recover.

I've checked our inotify library, our file_watcher library and everything reported there looks sane. I suspect the issue lies with Fs_memo, but I need to dissect this issue further to be certain.

Alizter avatar Apr 21 '25 13:04 Alizter

Here is my new and improved repro case:

Reproduction case for #11010. Neovim saves files in a perculiar way which interacts poorly
with dune watch mode.

  $ DONE_FLAG=_build/done_flag

  $ cat > dune-project <<EOF
  > (lang dune 3.18)
  > EOF

  $ cat > dune <<EOF
  > (executable
  >  (name foo)
  >  (libraries unix))
  > EOF

  $ cat > touch.ml <<EOF
  > let touch path =
  >  let fd = Unix.openfile path [ Unix.O_CREAT ] 0o777 in
  >  Unix.close fd
  > ;;
  > EOF
 
  $ cat > foo.ml <<EOF
  > let () =
  >   Touch.touch "$DONE_FLAG"
  > ;;
  > EOF

Simulate the detailed vim save sequence observed via inotifywait.
Timing is crucial and needs to be tuned via sleep based on observation.
  $ simulate_nvim_save() {
  >   local original_fn="$1"
  >   local sleep_duration=${2:-0.01}
  >   local backup_fn="${original_fn}~"
  > 
  >   # Simulates MOVED_FROM, MOVED_TO
  >   mv "${original_fn}" "${backup_fn}"
  >   
  >   # Sleep after rename to backup
  >   sleep "$sleep_duration"
  > 
  >   # Simulate new file creation and write
  >   echo "let () = print_endline \"saved\"" >> "${original_fn}"
  >   
  >   # Sleep after new original creation/write/close
  >   sleep "$sleep_duration" 
  >   rm "${backup_fn}"
  > }

  $ dune exec -w -- ./foo.exe --debug-backtraces &
  Success, waiting for filesystem changes...
  Error: _build/default/.foo.eobjs/dune__exe__Foo.impl.d: No such file or
  directory
  -> required by _build/default/.foo.eobjs/dune__exe__Foo.impl.all-deps
  -> required by _build/default/foo.exe
  Had 1 error, waiting for filesystem changes...
  $ PID=$!
  $ ./wait-for-file.sh $DONE_FLAG

This might not trigger correctly on every machine, so this parameter can be tuned
accordingly.
  $ simulate_nvim_save touch.ml 0.01

  $ kill $PID

Alizter avatar Apr 21 '25 14:04 Alizter

I have a fix for this issue, but I am not yet sure it is the right fix. The idea is to introduce a small delay just before we run our build and program via Scheduler.sleep ~seconds:0.1. Doing this avoids the race condition observed in the cram test.

This might just be pushing the real problem under the rug however and I am not too fond of making all rebuilds slower for no reason.

Alizter avatar Apr 21 '25 15:04 Alizter

Can we make it so that dune just doesn't crash on this error and just keeps waiting for the next file change? Presumably at some change the save will be finished and dune will be able to build successfully?

yawaramin avatar Apr 21 '25 16:04 yawaramin

@yawaramin Yes, that is the goal here. The issue is that Dune's internal file system tracking becomes invalidated leading it to believe a file is missing when it is actually present. This is likely due to a race-condition with how we are loading rules in dune exec -w.

Once this internal representation is broken, subsequent rebuilds cannot progress due to the erroneous state. We don't get this issue in dune build -w since things are correctly loaded there.

This is reported as a user-error but it is really an unexpected side-effect of dune's current implementation that needs to be fixed. I don't think special-casing this for a full-reload of the rules is a good idea when we can just fix it.

I'm still not sure what the best fix is, but I will come up with something soon.

Alizter avatar Apr 21 '25 18:04 Alizter