merlin icon indicating copy to clipboard operation
merlin copied to clipboard

Add support for LSIF

Open chrismwendt opened this issue 6 years ago • 6 comments
trafficstars

LSIF (Language Server Index Format) is a format for precomputing LSP data (definition, reference, and hover results, etc.) and saving it to a file for later use by code navigation tools.

I'm interested in adding LSIF support to Merlin. By digging around the code a bit, I've been able to gain some familiarity OCaml (and I'm liking the language 😸) but I haven't been as successful at understanding how Merlin fits together and what pieces I can reuse.

Relevant pieces appear to be Typedtrie, Browse_tree, MBrowse, Env, MTyper, and MPipeline. I'm not sure how they all relate to one another at a high level (e.g. what are they and how do they map to OCaml modules, files, types, build config, etc.? How does one use them together?). It's not clear to me exactly what a Path is. It doesn't seem to have anything to do with file paths. It seems to be either an identifier, record access, or a function call.

My current plan is to:

  • Figure out how to iterate over files in the current project. list-modules looks useful, but I haven't been able to get it to return results on the command line. Typedtrie.of_browse looks useful, too.
  • For each file: recursively walk the Browse_tree (like shape does)
  • For each identifier, somehow call locate, and add that (definition, reference) pair to the LSIF output

If anyone with more OCaml/Merlin experience could chime in with tips, suggestions, or a different way of accomplishing this, that would be great!

cc @trefis

chrismwendt avatar Sep 10 '19 03:09 chrismwendt

Hey @chrismwendt :) I've been working on this on the side in my personal time as a prototype/in a fork. Extracting types and definitions from merlin in my fork, and then exporting to LSIF, is largely done, see: https://github.com/rvantonder/lsif-ocaml. I've been holding out on announcing initial support.

My approach was to modify merlin's internals to make dumping merlin types and definition simple and efficient compared to repeatedly querying ocamlmerlin-server. Those interested can see the modifications to merlin and notes in my fork and PR here: https://github.com/rvantonder/merlin/pull/3. Presently, pinning and installing that version of merlin in my branch will not conflict with the standard merlin binary, so it can be used safely with lsif-ocaml, as documented.

As mentioned in the discussion in my fork, I first roughly tokenize the source and then query and dump type/definition information. Merlin maintainers who are more familiar with the internal data structures/AST may find a more efficient way to export (or directly dump) LSIF data. That may take a bit longer to support in merlin compared to the the prototype ocaml-lsif tool I link above (and presently, the approach I take seems fast enough). Happy to discuss how this could integrate into merlin proper.

rvantonder avatar Sep 10 '19 05:09 rvantonder

Hey @rvantonder, that's awesome! Thanks for all the links and context. I remember you working on this a while back, and it's great to see it in a repo now. This looks very promising - I'll dive in first thing tomorrow 😸

chrismwendt avatar Sep 10 '19 05:09 chrismwendt

My current plan is to:

@trefis Any thoughts on this section and my plan for only visiting interesting AST nodes, rather than every punctuation character?

chrismwendt avatar Sep 13 '19 21:09 chrismwendt

Status I got cross-file hovers/defs/refs using LSIF working on Sourcegraph.com. Here's an example repo with patdiff and its dependencies: https://sourcegraph.com/github.com/chrismwendt/patdiff-and-deps/-/blob/patdiff/lib/import.ml#L4:5&tab=references

Implementation I used https://github.com/rvantonder/merlin/pull/3, but not https://github.com/rvantonder/lsif-ocaml (I needed to add support for cross-file defs and refs, and I'm not fluent enough in OCaml to do that quickly). I replaced lsif-ocaml with:

  • https://github.com/sourcegraph/merlin-to-coif converts intermediate files generated by https://github.com/rvantonder/merlin/pull/3 into another format called CoIF (like LSIF, but simpler)
  • https://github.com/sourcegraph/coif-to-lsif converts CoIF to LSIF

Known issues There are a few known issues that could get ironed out with more effort:

  • The OCaml LSIF indexer still doesn't walk the AST. It naively looks to the left and right of all punctuation characters in each file and calls the internal dispatch function in Merlin for each.
  • Only 1 character of the definition in the references panel is highlighted (the whole symbol should be highlighted). Perhaps Merlin could be patched to provide the definition range rather than just the position.
  • Some files have almost no code intel data (e.g. patdiff/lib/float_tolerance.ml). I haven't determined the cause of this yet, but I'm guessing Merlin just needs a bit more configuration of some kind.

chrismwendt avatar Sep 19 '19 15:09 chrismwendt

Glad the merlin fork helped. Re known issues: FWIW, the naive approach I used (around and on punctuation like parentheses, which merlin will type for the enclosed expression, as well as all tokens around whitespace) should be capturing complete knowledge of what merlin knows, e.g., if you loaded it in an editor and asked for the first type at every character position in the file, the results would be the same. I'd be somewhat surprised (in a good way) if some thing is missed, but I can validate this a bit later. The real opportunity is how much faster it could be with a visitor.

rvantonder avatar Sep 19 '19 17:09 rvantonder

should this be closed as something very unlikely to happen in merlin? (perhaps in ocaml-lsp or some deriving project)

ulugbekna avatar Dec 10 '21 23:12 ulugbekna