rules_haskell
rules_haskell copied to clipboard
Suggestion: improved process for language server setup
I'm configuring haskell-language-server on a huge codebase. I do have many problems with the approach currently documented in https://rules-haskell.readthedocs.io/en/latest/haskell-use-cases.html#configuring-ide-integration-with-ghcide.
I open this ticket in order to describe a different approach which mostly only impact documentation. Based on the result of the discussion, I could open a documentation PR.
First of all, I do have a huge codebase, with many Haskell packages, depth dependency tree. I also have build time generated Haskell code, and dependencies on many shared library built in different languages. I'm not aware of a more complex codebase using rules_haskell.
The documentation proposes to set a global haskell_repl which references all the haskell_library of the repository. The documentation does not document it, but you may have two solutions to references your libraries:
- using
from_source, each Haskell module will be loaded from source. This solution does not scale for me, haskell-language-server is taking hours (and lot of RAM) in order to start. - using
from_binary, the packages linked in the repl will be first compiled by bazel and then added to the build information as-package. This solution works for me, however it forces the user to wait for a full build of the codebase, which takes more than an hours or a quarter of hour if the user does have a fast remote cache access. - It forces the creation of a global repl referencing all the target of the repository. This global repl is difficult to maintain (you need to add / remove entries everytime that something is changed in the repo). We used an automated process (using
bazel query) in order to generate theBUILDfile for this repl. One problem was that some targets are failing, for some reasons, so we had to tag all of our failing target to ensure that they won't end in the repl.
I tried a different approach than with success. Instead of building a global repl, I'm using a repl associated with an haskell_library which uses the file.
The hie-bios file does accept the path to the file being checked as first argument. I'm then using the following query:
bazel build $(bazel query "kind(haskell_library, //...) intersect somepath(kind(haskell_library, //...), $(kazel query "$FILEPATH" ))")@repl
where FILEPATH is the path to the file provided by HLS, and @repl is my repl attached to each haskell_library or haskell_binary.
There is a few positives points with this approach:
- I can use
from_binaryon the repl without forcing the rebuild of the full codebase. - If I'm using
from_source, it is way faster because it does not need to evaluate 1000 Haskell files - I do still have a global "wildcard" in order to feed
from_binaryandfrom_sourcefor each repl, that I can tune. - No need for a global repl anymore. That's a huge improvement. I don't need to maintain a static file for it (and the linter scripts in order to ensure that the static file is up to date).
- More robust to failure. If one "target" is broken for any reason, it will only fail in the language server for that target, instead of crashing the repl for all targets. This was really painful in my former implementation and I had to manually tag all the "working" targets in the repo.
- Does now work for haskell_binary and haskell_test, because they can have a repl.
What are your thoughts about this approach? I can update the rules_haskell documentation with it if you think that's a good idea.
Haskell language server
Unrelated note, I'm using haskell-language-server instead of ghcide as depicted in the documentation. I have no special problem with it, except:
- #1482, fails if there is two packages with the same name in the project (such as
//third_party/hackage:lensand//my/project:lenswhich shares the same package namelens). This problem appears less with the approach I just depicted. - I do have a problem with
PATH.haskell-language-serverdoes not find theghcused by my project (thisghcis provided bynixpkgs_packagein bazel). I had to write a wrapper for that.
Considering that ghcide is now deprecated in favor of haskell-language-server, I propose to update the documentation to recommend haskell-language-server instead.
Thank you for raising this! I agree the docs on IDE integration could definitely use improvement. As they say at the beginning, the current status is preliminary. I'd be very happy to take PRs on this! I don't think I'll have time to work on hls support myself in the foreseeable future.
Considering that ghcide is now deprecated in favor of haskell-language-server, I propose to update the documentation to recommend haskell-language-server instead.
Agreed, the docs should be updated from ghcide to hls.
The documentation does not document it, but you may have two solutions to references your libraries: [...]
Yes, this should be clarified in the use-case docs as well. The from_binary and from_source attributes are documented in the API docs. Just to clarify, the from_source and from_binary attribute don't directly reference targets, instead they are patterns to match on targets included in the transitive closure of deps. More technically, they are of type attr.string_list, i.e. they cannot incur dependency.
I'm then using the following
query:
Could you clarify what motivates this query and how it works?
IIUC it aims to find the closest haskell_library to a given source file and picks it's autogenerated @repl target. But, IIUC it relies on somepath returning the shortest path. Is that guaranteed? I can't find it stated in the docs. Also, this only seems to support haskell_library, how do you treat haskell_binary targets?
Another note, query does not take configuration into account. So, query may yield bogus results on cross platform projects. In the past I have used this cquery to discover the haskell_library/binary/test to a source file. Any thoughts on this?
I can use
from_binaryon the repl without forcing the rebuild of the full codebase.If I'm using
from_source, it is way faster because it does not need to evaluate 1000 Haskell files
Could you clarify how you set these attributes and what values you set them to?
The @repl targets are autogenerated with from_source set to only include the target at hand. I.e. all dependencies are loaded as packages and, therefore, have to be built. E.g. if the given target depends on all other Haskell targets in the project, then indeed all other targets would need to be built first.
I do still have a global "wildcard" in order to feed
from_binaryandfrom_sourcefor each repl, that I can tune.
Similar to above, it's not clear to me where you can define these global patterns.
No need for a global repl anymore. That's a huge improvement. I don't need to maintain a static file for it (and the linter scripts in order to ensure that the static file is up to date).
Agreed, requiring a single global repl for hls integration is not what we want. The historical reason is that multi-cradle support was not available at the time when this was developed. See related discussion here.
Relatedly, I experimented with generating hie-bios files on-the-fly directly from the aspect in the past, see here. Unfortunately, I ran out of time on these experiments. An issue I encountered in that approach, that may be relevant here, was that ghcide (this was not on hls, yet) ended up creating too many overlapping ghc sessions, which ate up too much memory.
More robust to failure. If one "target" is broken for any reason, it will only fail in the language server for that target, instead of crashing the repl for all targets. This was really painful in my former implementation and I had to manually tag all the "working" targets in the repo.
Build failure should only be an issue for from_binary dependencies. from_source dependencies should not be built and any errors in them can be reported by ghci or hls.
Does now work for haskell_binary and haskell_test, because they can have a repl.
I don't understand this point. haskell_binary and haskell_test could have a repl target before, in fact rules_haskell autogenerates such for them.
Thank you for reading this and commenting, let's address your questions.
The documentation does not document it, but you may have two solutions to references your libraries: [...]
[..] Just to clarify, the from_source and from_binary attribute don't directly reference targets, instead they are patterns to match on targets included in the transitive closure of
deps. More technically, they are of typeattr.string_list, i.e. they cannot incur dependency.
Yes, thank you for the clarification. I'm using them indeed as pattern, for example, I do have //... in from_source and //third_party/... in from_binary, considering that third_party are not changing much and may very well have a weird build setup.
I'm then using the following
query:Could you clarify what motivates this query and how it works? IIUC it aims to find the closest
haskell_libraryto a given source file and picks it's autogenerated@repltarget. But, IIUC it relies onsomepathreturning the shortest path. Is that guaranteed? I can't find it stated in the docs. Also, this only seems to supporthaskell_library, how do you treathaskell_binarytargets?
somepath returning the shortest path is not guaranteed, but that's something I observed. Actually, the semantic is correct with any path, it will just have an influence on performances.
bazel build $(bazel query "kind(haskell_library, //...) intersect somepath(kind(haskell_library, //...), $(kazel query "$FILEPATH" ))")@repl
You are right about your understanding of the query. Let me detail it:
somepathdoes indeed selects "one"haskell_library, based on the "target name' for$FILEPATH.- I'm getting the "target name" for
$FILEPATHusingbazel query $FILEPATH, but it actually works by passing$FILEPATHdirectly in the main query. - The
intersectis in order to refine the final result which contains the initial file. I don't really understand why, but the example usafe ofsomepathin the documentation does actually includes theintersect. - I do treat
haskell_binarysimilarly. One solution may be to just do the same query and join them withsome, but I have no guarantee about which one will be returned first. Another solution (which I currently use) is to do two queries, one withhaskell_library, a other withhaskell_binaryand pick the first result returned.
Another really good solution is:
(kind(haskell_library, //...) union kind(haskell_binary, //...)) intersect rdeps(kind(haskell_library, //...) union kind(haskell_binary, //...), build/rule/haskell/prelude/P.hs, 1)
If ensures that the shortest path is selected with rdeps(...., 1) and picks both haskell_library and haskell_binary. However there is a problem if the Haskell file is loaded through a pre-processing because the depth of the dependency tree until it finds a haskell_library may be more than 2.
Another note,
querydoes not take configuration into account. So,querymay yield bogus results on cross platform projects. In the past I have used thiscqueryto discover the haskell_library/binary/test to a source file. Any thoughts on this?
No. I just tried cquery and it fails for me (we may have a weird configuration for which it does not work). Fortunately we do not have any configuration which impacts the dependency tree of Haskell files. Thank you raising the issue, I'll try to make it work with cquery.
- I can use
from_binaryon the repl without forcing the rebuild of the full codebase.- If I'm using
from_source, it is way faster because it does not need to evaluate 1000 Haskell filesCould you clarify how you set these attributes and what values you set them to? The
@repltargets are autogenerated withfrom_sourceset to only include the target at hand. I.e. all dependencies are loaded as packages and, therefore, have to be built. E.g. if the given target depends on all other Haskell targets in the project, then indeed all other targets would need to be built first.
I was unclear. The @repl I'm using in the example is our custom repl (called krepl actually) which does have a from_binary and from_source set using the union of global values + values set on each local haskell_library or haskell_binary. (We do have a wrapper, k_haskell_xxx which accepts theses flags and dispatch to the official haskell_library/repl, ....)
Build failure should only be an issue for
from_binarydependencies.from_sourcedependencies should not be built and any errors in them can be reported by ghci or hls.
Well, some from_source targets are failing because they cannot find some C symbols, which is not happening when using from_binary. That's totally a problem with our setup that I didn't take the time to understand.
Does now work for haskell_binary and haskell_test, because they can have a repl.
I don't understand this point.
haskell_binaryandhaskell_testcould have a repl target before, in fact rules_haskell autogenerates such for them.
I was unclear, I meant that, afaik, you cannot have a "global" repl which references haskell_binary and haskell_library at the same time. I may be wrong however.
Thank you for the different link to your experimentation, I'll have a look.
Thanks for clarifying and explaining the query.
Well, some
from_sourcetargets are failing because they cannot find some C symbols, which is not happening when usingfrom_binary. That's totally a problem with our setup that I didn't take the time to understand.
Just a hunch: Are these missing libraries coming from external repositories? For the hie-bios file we don't prefix command line library search paths with the execroot. This works fine for local library targets, but external libraries will have entries such as -Lexternal/some_workspace/.... These paths are only valid in the execroot, but not in the repository root. To fix this we'd need to add a prefix, e.g. $RULES_HASKELL_EXEC_ROOT to external paths here and resolve and replace it in the .hie-bios script. There was a similar issue here.
I was unclear, I meant that, afaik, you cannot have a "global" repl which references
haskell_binaryandhaskell_libraryat the same time. I may be wrong however.
That should work, haskell_repl can have multiple deps and they can be either of haskell_library|binary|test.
Just a hunch: Are these missing libraries coming from external repositories?
No, that's actually not missing libraries, but missing symbols.
Good news, I did some progress reading https://github.com/haskell/haskell-language-server/issues/1160#issuecomment-756566273 , compiling my haskell-language-server as dynamic does solves most of my missing symbol issues in a template haskell context.
Hello. Any progress on this?
What's the status here? Is this only a docs issue, or are there some improvements we could make towards better compatibility with HLS?
Hi, I'm also interested in this. Currently I'm replicating Bazel in stack just for Language Server Support. Anyway we can help? It seems like @guibou did all the work already
@tonicebrian sorry, I won't help you more here, at work I've stopped using bazel for HLS a few months ago and last week I've merged the removal of bazel from the codebase at work.
@tonicebrian sorry, I won't help you more here, at work I've stopped using bazel for HLS a few months ago and last week I've merged the removal of bazel from the codebase at work.
Ups, if you don't mind could you explain why did you abandon bazel? I'm doing the reverse path because I think it will help me in my polyglot purescript+Haskell project. I don't want to find out 6 months in the future that it wasn't the right choice. With what did you swap bazel?
@tonicebrian Really long story short. Bazel had never worked for me, it had never answered any of its promise of being simple, fast, robust, composable, ... (lot of buzzwords).
We swapped bazel to a builder written integrally in nix. We got the same features in 600 lines of nix and got a reproducible build, remote build, remote cache, as well as feature that bazel was not providing (as far as I know) without patching it or the rulesets.
In a first company, I started experimenting with nix as a build system and I had promising results, but I was not in a position to propose a switch (I was hired as a bazel consultant). In another company, I joined when bazel was already the build system of choice. Considering my bazel "experience", other developeres asked me if we should switch away from bazel. I decided to not, the situation was acceptable and I did not wanted to be involved much in build system. However, after one year, I had to fix / workaround a lot of problems similar to what happened in the previous company, that the idea to move away from bazel reappeared. Things accelerated recently after we spent a few days tracking a numerical error in simulation code which was actually due to an hermeticity problem in bazel (in short, bazel is using the system library loader in its test runner, which had impacts on our numerical code). I then restarted the nix experiment. It took me a day to write a POC in order to push binaries built with nix in production. We then decided that we'll prioritize the transition
Do I recommend this approach? It depends. The current benefit is that things are working as intended. The main drawbacks are that nix have slower evaluation time (no-op build is ~1.5s), which is not a problem for us, and that we have a super specific piece of software that nobody else than me and a few colleague understand. Is that a problem? Yes. Is that worse than the 3k lines of bazel code + 5 forks of ruleset + 30 bugs open in our bugtracker, I don't know. Would this approach have been possible if I had not fighted 4 years with bazel, definitely no, because I acquired a lot of knowledge about build system, build, link, ghc, gcc, python, ... and how theses stuff are interacting together.
Sorry, my answer is mostly feelings and buzzwords (we may not have the same definitions of what "composable" or "robust" means). I would be able to develop with countless examples of what nightmares I had to fight with bazel, and unfortunately I don't have enough feedbacks on the new nix based build system to comment. Maybe we'll go back to bazel in a few month, who knows (Note that our nix build system uses BUILD.bazel files as input and I'm tempted to keep both build system working in the codebase in a few month so it should not be difficult to revert this change in case we discover a blocker in the future)
In short, try bazel. If it works for you, be happy. If not, don't hesitate to move to something else.
Thanks for the thorough response, I've just realised that I asked you same thing in Twitter a couple of weeks before without noticing that both avatars were the same person 😀
My use case is server in Haskell and frontend in purescript with some code generation for DTOs and client calls, and lots of microservices. That was the reason for going for bazel: multilanguague, dependencies between different language artifacts and remote caching. BUT rules_purescript is far from satisfactory and now I'm more than doubtful. I need to learn nix.