merlin icon indicating copy to clipboard operation
merlin copied to clipboard

Request: do not use chdir for ppx

Open rgrinberg opened this issue 2 years ago • 18 comments

This isn't something user facing, rather a matter of making it easy to use merlin correctly in ocamllsp. You might be aware that ocamllsp runs the merlin pipeline in a dedicated thread to make sure the server stays responsive while merlin is busy. That works well until preprocessing is involved. chdir is not threadsafe and therefore merlin's use of it for running pp/ppx breaks ocamllsp in some subtle (and painfully hard to debug) ways. Thus my humble request from the merlin team is to accept a patch that removes the use of chdir.

I haven't yet written the patch, but I'm thinking that it can be done by using the spawn library. This library is successfully used by dune and ocamllsp itself so it is well tested. It's also light on the dependencies.

The only problem I could imagine is that merlin has some code for launching processes on Windows. So perhaps we can keep the current code for windows and add a cwd argument? Or improve spawn in a way that makes this custom code unnecessary? cc @MisterDA

rgrinberg avatar Jan 01 '22 22:01 rgrinberg

cc @let-def

voodoos avatar Jan 07 '22 14:01 voodoos

I think this would be a nice improvement, even outside ocamllsp. I prefer to avoid adding a dependency on spawn, launching is already done from C code, it shouldn't be hard to extend this specific function with an extra parameter for the working directory. You are welcome to submit a patch (if you are ok).

let-def avatar Jan 12 '22 13:01 let-def

Agreed with let-def, no special need for a new dependency. I've taken a look at spawn and I have a couple of patches for spawn on Windows, that I'll tidy up and submit soon.

MisterDA avatar Jan 12 '22 13:01 MisterDA

I prefer to avoid adding a dependency on spawn, launching is already done from C code, it shouldn't be hard to extend this specific function with an extra parameter for the working directory.

The advantage of using spawn is that we'll have a single code path for spawning processes in lsp. That seems useful. if you don't want a dependency, you could also vendor the spawn library inside merlin. Just a suggestion, feel free to do it however you prefer. I'm just happy to see chdir go away.

rgrinberg avatar Jan 13 '22 03:01 rgrinberg

One issue I can see is that is that merlin gets a command line to run as a PPX, which is executed via system(). But the "fancy" spawn functions that allow setting the cwd use "program, argv"-style.

nojb avatar Jan 13 '22 07:01 nojb

Is that a problem in practice? In dune, we only generate ppx invocations that are independent of sh. And even if we needed sh, we could always spawn something like: spawn "sh" ["sh", "-c", cmd]

rgrinberg avatar Jan 14 '22 00:01 rgrinberg

Is that a problem in practice?

I'm not sure, but having to quote things always makes me nervous :)

nojb avatar Jan 14 '22 06:01 nojb

In merlin on Windows we're already using CreateProcess so adding a cwd argument should be easy. On other OSes, I guess the only reason we're calling system(3) is to redirect the ppx stdout to stderr. That's easy to do without the shell anyway, either with the Unix library or in C after the fork. I'm not sure what's the best way to change directory in the child, maybe just after the fork too?

MisterDA avatar Jan 14 '22 08:01 MisterDA

When I was manually writing .merlin files, I would use some shell syntax in the -pp argument to refer to environment variables... I don't think it is an important feature though, I am totally fine losing that... (Is it wrong?)

For the invocation, it would be nice to use posix_spawn, but the champions that designed it managed to miss the ability to specify the working directory. This has been added as a non portable extension, we cannot rely on it at the moment. That can be worked around with a few lines of C (vfork/chdir/execve, whatever). I can propose a patch for the unix-side of the code.

By the way, I think Merlin should still chdir to / after launching the server, but that should not matter for lsp-server.

Finally, there are two notions of "working directory" in Merlin:

  • internally, to affect the way paths are resolved: we want to find source files relative to the current directory; for this, the "current directory" should be the source directory
  • when invoking external processes, to mimic the build system as closely as possible; in the case of Dune, it means invoking external binaries such as ppx with the same current directory as Dune would choose.

I don't know if this subtlety can be communicated via the protocol; if not it is something to consider for the future.

let-def avatar Jan 18 '22 11:01 let-def

When I was manually writing .merlin files, I would use some shell syntax in the -pp argument to refer to environment variables... I don't think it is an important feature though, I am totally fine losing that... (Is it wrong?)

Hopefully nobody has to write a .merlin file manually ever again :)

For the invocation, it would be nice to use posix_spawn, but the champions that designed it managed to miss the ability to specify the working directory. This has been added as a non portable extension, we cannot rely on it at the moment. That can be worked around with a few lines of C (vfork/chdir/execve, whatever). I can propose a patch for the unix-side of the code.

Yes, fork (or vfork on linux) + chdir is the way to go. posix_spawn can be added as an optimization later (we plan to do it in spawn in fact)

By the way, I think Merlin should still chdir to / after launching the server, but that should not matter for lsp-server.

Why do you think so?

rgrinberg avatar Jan 19 '22 23:01 rgrinberg

That's easy to do without the shell anyway, either with the Unix library or in C after the fork. I'm not sure what's the best way to change directory in the child, maybe just after the fork too?

The way spawn does is it is chdir in the child and then exec. Exactly as you've said.

rgrinberg avatar Jan 19 '22 23:01 rgrinberg

I've prepared the following patch that seems to work:

https://github.com/rgrinberg/merlin/commit/5a41fbe40f75eb0f420ae828c96939da251a6042

I avoided the use of chdir even for linux as it's always in a racy environment. I tested it on windows/mac and it seems to work.

rgrinberg avatar Jan 30 '22 06:01 rgrinberg

I left a few commets about the Windows stubs there.

nojb avatar Jan 30 '22 09:01 nojb

Why do you think so?

@rgrinberg: Sorry for the late reply. In the past, we had users that launched merlin from a directory that was sometimes deleted soon after (for instance, after a build system cleanup).

The behavior was quite chaotic after the working directory was removed (most stuff work, but occasional actions would fail) It seems safe to chdir to /. Do you see any drawback to that?

let-def avatar Jan 31 '22 14:01 let-def

As long as this behavior isn't forced on those that use merlin as a library, I'm ok with it.

rgrinberg avatar Feb 01 '22 22:02 rgrinberg

What is the status on this ? @rgrinberg has your patch been in used in released versions of ocaml-lsp yet ? Would you mind opening a PR with its latest version ?

It is one of the last items blocking the direct use of Merlin as a lib in ocaml-lsp.

voodoos avatar Jun 29 '22 09:06 voodoos

Yes, it's been used in LSP for a while now. One thing that still leaves me unsatisfied about the patch is the indirection to shell. I'd much prefer if the preprocessing command was interpreted in the same that dune does. I suppose I can submit a patch if the issue above doesn't deter you.

rgrinberg avatar Jun 29 '22 16:06 rgrinberg

@rgrinberg, @let-def, any news on this ?

voodoos avatar Sep 13 '22 11:09 voodoos