expert icon indicating copy to clipboard operation
expert copied to clipboard

fix: resolve versions mismatch causing a crash when building engine

Open heywhy opened this issue 4 months ago • 14 comments

This generates a .tool-versions file in the engine source before building so that the correct Erlang and Elixir versions are used.

See #91

heywhy avatar Aug 30 '25 03:08 heywhy

I'm going to let @doorgan review this, but the PR title will go on the changelog, so it needs to be more descriptive.

mhanberg avatar Aug 30 '25 09:08 mhanberg

I'm not sure this is the right approach for this issue, we need to support workspace folders/multiple root projects(like expert) that might be running on different elixir/erlang versions This approach will lock Expert to a single elixir/erlang pair not only for subprojects, but also for any project you use Expert on

Ultimately the issue here is that we're lacking the complete PATH at the project directory, and I wonder if this:

Enum.map(System.get_env(), fn
      {"PATH", path} -> {"PATH", "#{asdf_path}:#{path}"}
      other -> other
    end)

is what's causing part of the issue in #91, asdf_path which contains the asdf shims comes before the rest of the path, which may be making asdf to pick up the global version instead of the local one

There are other issues that won't be fixed unless we manage to get the right environment here, for example if you use rustler then the project compilation will fail because it won't be able to find rustc

doorgan avatar Aug 30 '25 16:08 doorgan

I'm not sure this is the right approach for this issue, we need to support workspace folders/multiple root projects(like expert) that might be running on different elixir/erlang versions

I think it's essential to note that every project overrides its respective versions.

Ultimately the issue here is that we're lacking the complete PATH at the project directory, and I wonder if this

This is not the case for the issue I reported. The problem here is straightforward:

  1. The project I'm on has Elixir 1.18-otp-28 and Erlang 28.0
  2. I have a global tools version that defers to Elixir 1.17-otp-27 and Erlang 27.0
  3. The engine build process picks the correct elixir version (1.18-otp-28) through the elixir_executable method, but uses the global Erlang version (27.0) defined in the global .tool-versions file in my home directory

So, to mitigate the version mismatch, this PR generates a .tool-versions file that has the same Elixir and Erlang versions as the project it's compiling for.

heywhy avatar Aug 30 '25 17:08 heywhy

There are other issues that won't be fixed unless we manage to get the right environment here, for example if you use rustler then the project compilation will fail because it won't be able to find rustc

I don't see how this affects the current engine build process, except that there's a plan to integrate Rust in the future. In addition, the changes made cater to merging the versions as needed, assuming the engine source has a .tool-versions file and requires a specific version of Rust. Maybe I need to look into the codebase more to understand what you're trying to point out here.

And just to emphasize again, the issue isn't with my project building, but the expert engine build failing because the Elixir version used to compile it was built for a different otp version that is defined in my global .tool-versions file.

heywhy avatar Aug 30 '25 17:08 heywhy

Sorry, I misread part of your PR and I'm wrong about this locking expert to a single version pair. You're right on most of it, and the only scenarios where I think this may cause issues is if two engines are trying to be built(either because you launched more than one expert at the same time or if you open two files from two subprojects at the same time once #18 is merged). So it's probably fine?

There are other issues that won't be fixed unless we manage to get the right environment here, for example if you use rustler then the project compilation will fail because it won't be able to find rustc

I don't see how this affects the current engine build process, except that there's a plan to integrate Rust in the future

To clarify, we use the same method to use the elixir/erlang versions for both the vm that builds the engine, and the vm that runs the engine. If your project has rustler as a dependency, even if Expert doesn't use it itself, the engine node will fail to compile it if rustc is not in the PATH used when launching the engine node. I see that this PR doesn't break this, but my point is that if we're getting the wrong PATHs, this PR fixes the issue of finding the right elixir/erlang but it will still miss any other thing that the project needs in the PATH, rustc is one example.

Ultimately the issue here is that we're lacking the complete PATH at the project directory, and I wonder if this

This is not the case for the issue I reported. The problem here is straightforward:

  1. The project I'm on has Elixir 1.18-otp-28 and Erlang 28.0
  2. I have a global tools version that defers to Elixir 1.17-otp-27 and Erlang 27.0
  3. The engine build process picks the correct elixir version (1.18-otp-28) through the elixir_executable method, but uses the global Erlang version (27.0) defined in the global .tool-versions file in my home directory

The reason you're getting the wrong version is due to asdf not providing a complete PATH, before I added some code to derive the path to the asdf shims and manually add it to the PATH it wouldn't be able to find erlang at all. For example, this is the PATH I get from asdf env elixir:

/Users/dorgan/.asdf/installs/elixir/1.17.0/bin:/Users/dorgan/.asdf/installs/elixir/1.17.0/.mix/escripts:/Users/dorgan/Library/Application Support/.burrito/expert_erts-15.2.6_0.1.0/erts-15.2.6/bin:/Users/dorgan/Library/Application Support/.burrito/expert_erts-15.2.6_0.1.0/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/run/current-system/sw/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

Asdf is not giving me the path to erlang, nor rust, zig or any of the other asdf managed programs I have. This is the issue we need to fix. The reason you're getting the globally installed erlang version is because of the code that prepends this to the PATH: /Users/dorgan/.asdf/shims

This PR does mitigate the issue for the engine build, but we still need to find a way to get the right PATH

doorgan avatar Aug 30 '25 17:08 doorgan

tl;dr I think we can use this to mitigate the issue temporarily but we still need to fix the root cause

doorgan avatar Aug 30 '25 18:08 doorgan

Reading your comment here really helped me to understand your points better.

I think this may cause issues is if two engines are trying to be built(either because you launched more than one expert at the same time or if you open two files from two subprojects at the same time once https://github.com/elixir-lang/expert/pull/18 is merged). So it's probably fine?

I would have suggested generating a lock file, but I worry about when an unexpected interruption happens, leaving a dangling lock file behind. I'm tempted not to support the idea because the goal of the LSP should be a seamless experience without the users having an idea of the internal workings. In addition, if there is an overwrite one way or the other, I would assume a simple server restart should do.

tl;dr I think we can use this to mitigate the issue temporarily but we still need to fix the root cause

I agree that the PR can be merged temporarily until a robust fix is found (which I will obviously like to take a stab at), so as to prevent other users from facing the same issue as myself which took me hours to figure out.

Is there a channel where questions can be asked aside from GitHub issues? I would assume there is a channel in the Elixir Discord Server.

heywhy avatar Aug 30 '25 19:08 heywhy

@doorgan I think in elixir-tools.nvim to install and compile Elixir LS with the projects versions, I would copy the source code to a temporary directory in the project itself so it would see its PATH/etc, but then move the artifacts to the central cache for.

A similar thing could work here. I'm on mobile and haven't read the rest of the convo above but the figured I drop this before I forget.

mhanberg avatar Aug 30 '25 19:08 mhanberg

@mhanberg, I don't think copying to a temporary folder is efficient, considering how large some projects can be.

Regardless, I already updated the implementation to pick the current versions specified in a project's .tools-versions.

heywhy avatar Aug 31 '25 20:08 heywhy

@mhanberg, I don't think copying to a temporary folder is efficient, considering how large some projects can be.

Regardless, I already updated the implementation to pick the current versions specified in a project's .tools-versions.

We would copy the engine source code, not your project source code.

In general it would be better to pick a solution that works in general and not for a specific set of tool managers.

mhanberg avatar Aug 31 '25 20:08 mhanberg

We would copy the engine source code, not your project source code.

This is clear. My only concern is how fast this would pollute the filesystem, where every project would require its own build of the engine source code (as long as all projects do not necessarily share the same Elixir and Erlang versions). I do understand the rationale behind the current architectural choice.

I already have "elixir-1.17.3-erts-15.2.7" and "elixir-1.18.4-erts-16.0.2" folders in my user data directory, and this means the folders would keep increasing as long as I work with different combinations of Elixir and Erlang runtime versions.

I would suggest copying to the project _build directly so that all built assets can be easily cleaned instead of having dangling built assets hanging around, even after their respective project have been deleted.

In general it would be better to pick a solution that works in general and not for a specific set of tool managers.

Is this really possible, since one can't enforce the manner of installation that everybody should use? 🤔

heywhy avatar Sep 01 '25 01:09 heywhy

I would suggest copying to the project _build directly so that all built assets can be easily cleaned instead of having dangling built assets hanging around, even after their respective project have been deleted.

Unfortunately that's not viable, the engine needs to be built for the exact elixir version that is going to run it. I learned this the hard way a while ago when I got this error:

 ** (ErlangError) Erlang error: {:exception, :undef, [{:elixir_quote, :shallow_validate_ast, [:default], []}, {Future.Code.Typespec, :type_to_quoted, 1, [file: ~c"lib/future/code/typespec.ex", line: 78]}, {XPEngine.Modules, :"-fetch_types/1-fun-0-", 2, [file: ~c"lib/engine/engine/modules.ex", line: 88]}, {Enum, :"-reduce/3-lists^foldl/2-0-", 3, [file: ~c"lib/enum.ex", line: 2531]}, {XPEngine.Modules, :fetch_types, 1, [file: ~c"lib/engine/engine/modules.ex", line: 86]}, {XPEngine.CodeIntelligence.Docs, :parse_docs, 2, [file: ~c"lib/engine/engine/code_intelligence/docs.ex", line: 46]}, {XPEngine.CodeIntelligence.Docs, :for_module, 2, [file: ~c"lib/engine/engine/code_intelligence/docs.ex", line: 26]}, {XPEngine, :docs, 2, []}]}

I compiled the engine with one elixir version, ran it with the respective project's elixir, and it stopped working because some of the compiled code included calls to internal elixir functions that changed between versions. This happened on a quote call, and can happen with any other call.

ElixirLS does the same we're trying to do here for the very same reason, although it's a bit easier there because it's a single vm that runs everything so there's no PATH issues there.

Is this really possible, since one can't enforce the manner of installation that everybody should use? 🤔

I want to believe there must be a way but personally I'd be satisfied for now if we can reliably fix the asdf case

doorgan avatar Sep 01 '25 02:09 doorgan

Unfortunately that's not viable, the engine needs to be built for the exact elixir version that is going to run it. I learned this the hard way a while ago when I got this error:

Very tricky situation to be in.

personally I'd be satisfied for now if we can reliably fix the asdf case

I think the latest changes I made should be sufficient enough as what was done is simply to add selected/preferred tools (versions with the asterisk sign in front) to the PATH variable.

Screenshot 2025-09-01 at 10 01 16

In the screenshot above, the bin directory for both Elixir (1.18.4-otp-28) and Erlang (28.0.2) would be added to PATH. And if other tools versions were selected/defined too, they would get added to the PATH

heywhy avatar Sep 01 '25 09:09 heywhy

We would copy the engine source code, not your project source code.

This is clear. My only concern is how fast this would pollute the filesystem, where every project would require its own build of the engine source code (as long as all projects do not necessarily share the same Elixir and Erlang versions). I do understand the rationale behind the current architectural choice.

I already have "elixir-1.17.3-erts-15.2.7" and "elixir-1.18.4-erts-16.0.2" folders in my user data directory, and this means the folders would keep increasing as long as I work with different combinations of Elixir and Erlang runtime versions.

I would suggest copying to the project _build directly so that all built assets can be easily cleaned instead of having dangling built assets hanging around, even after their respective project have been deleted.

In general it would be better to pick a solution that works in general and not for a specific set of tool managers.

Is this really possible, since one can't enforce the manner of installation that everybody should use? 🤔

We only need to build the engine once for a set of versions, performance concerns are negligible.

I did this for Elixir LS and it was fine.

I don't understand what you mean by copying the _build directory, the problem is with what directory we are building the engine side of. The build artifacts are already sent to the appropriate place automatically.

If we build the engine inside of the project directory, it should pick up the right versions the same way your shell does. It's been done before.

mhanberg avatar Sep 01 '25 10:09 mhanberg

Hi @mhanberg @doorgan, I thought to nudge you that the implementation was updated. I don't know what your thoughts are, except that you're still inclined to find a fix that works without considering installed package managers.

In addition, this PR also resolves #104, as it was something I faced when I updated one of my environment variables.

EDIT: I believe it fixes #128 too

heywhy avatar Sep 13 '25 00:09 heywhy

Thank you for the ping of my issue, will test later if this resolves it. Just something you might consider is to also update the mise functionality since this should have the exact same issue as the asdf pathing in the code.

dennym avatar Sep 15 '25 12:09 dennym

@heywhy It seems it does not resolve the issue #128. See https://github.com/elixir-lang/expert/issues/59#issuecomment-3299988186

badosu avatar Sep 16 '25 20:09 badosu

please for all that is code, do not depend on a version manager to do this. asdf is not supported on Windows, and not everyone uses it. Making the official LSP depend on it would isolate whoever wants to work on Windows or not use asdf. Supporting each version manager could become a burden too.

Most, if not all, LSPs that depend on an executable to be run, uses what is in the path or through configuration.

See JDK-Requirements for how vscode-java does it

john-eevee avatar Oct 09 '25 14:10 john-eevee

@john-eevee for what it's worth, Expert does not depend on version managers, it will use whichever elixir you have in your PATH(see here). But if you use a version manager, we need it to pick up the same elixir your version manager uses, that's what we need to fix.

doorgan avatar Oct 09 '25 14:10 doorgan

@doorgan oh thanks for the link, I saw the focus on asdf and jumped to conclusions.

If I'm not mistaken the original issue was due to the Erlang version mismatch? If so the issue also provided an idea to use a erlang_executable function like the elixir one. That would solve the pickup problem?

If it's the engine application that needs the same version as the project, copying it to the .expert directory and building per project is a more straightforward way, it generates a bit of pollution but nothing crazy.

We could need to provide a message to the user that the engine crashed and urge them to check their versions.

john-eevee avatar Oct 09 '25 14:10 john-eevee

@john-eevee

If I'm not mistaken the original issue was due to the Erlang version mismatch?

That's part of the issue, the real problem is that we're not getting the right path that the user shell would see in the project directory.

@heywhy I opened #162 which uses a more general approach to fix this issue, please feel free to try it out

doorgan avatar Oct 09 '25 19:10 doorgan

Hi @doorgan, I've tested the fix you made, and everything works just fine. I would go ahead and close this PR.

heywhy avatar Oct 15 '25 23:10 heywhy

@heywhy that's great to hear, thank you for your time working on this PR! 🙇

doorgan avatar Oct 16 '25 02:10 doorgan