racket-langserver icon indicating copy to clipboard operation
racket-langserver copied to clipboard

textDocument/completion behaving strangely

Open benknoble opened this issue 1 month ago • 3 comments

I've got a few possible issues here, and I'm not sure what's at the root, but this is the information I've been able to collect.

I noticed all this because my LSP client receives no completions from the langserver. I used to get accurate completions.

Symptoms

Consider the file https://github.com/benknoble/frosthaven-manager/blob/main/defns/level.rkt (for example, raco pkg install --clone fhm https://github.com/benknoble/frosthaven-manager).

If I navigate to the end of the file and start appending new text, for example le, then trigger completion, then I find the following[^2]:

  • completion params are #hasheq((position . #hasheq((character . 4) (line . 45))) (textDocument . #hasheq((uri . "file:///Users/Knoble/code/frosthaven-manager/defns/level.rkt"))))
  • pos for guessing tokens is 1261
  • token is le, but no completions are found (shouldn't we find level-info and so on?)

Same thing happens for ma, which should complete at least max, match, etc?

[^2]: PS Occasionally I get an "index out of range error" which makes it seem like the open document doesn't have the contents of the Vim buffer and is operating from disk, but that might be a client issue, and I haven't been able to reproduce it with reliability. It might be related to the "sometimes the document text isn't right" issue.

Now, I go to the middle of the file… let's take the list-ref, remove the f, and trigger completion after the e, then we get:

  • completion params are '#hasheq((position . #hasheq((character . 10) (line . 40))) (textDocument . #hasheq((uri . "file:///Users/Knoble/code/frosthaven-manager/defns/level.rkt"))))
  • pos for guessing tokens is 1172
  • sometimes the document text doesn't have the list-re[^1], and so the "token" becomes level-, and no completions are returned (I think we should still get some?). It seems to me this happens most often when I restart the server altogether and trigger completion, so it might be some kind of client issue, but I'm not sure. I don't know how this code cooperates (or not) with clients to look at buffer contents, and it's possible Vim+ALE does something strange with the region of text between where I triggered completion and where ALE determines the completion should start matching.
  • It seems if I wait long enough, then trigger completion, the document text is correct, but the token is (list-re, and no completions are returned. Even removing the opening paren in such a situation (so the token is list-re) gives no results, and same if the token is list.

Now ironically, it seems that if I put the f back in and trigger completion, then the token is (list-ref, and the langserver responds with a large list of possible completions (which my client of course filters down to "only possible completion is list-ref).

That also means that if I trigger completion with an empty token (say, on any blank line), I get results! But I could swear that even last time I was regularly programming in Racket (last fall?) completion worked anywhere. However, if I start a new line (o in Vim) then trigger completion, the same empty token yields no completions?

Also, in the results, I get some of the expected definitions from that module, but I don't get max or match (I do get prop:legacy-match-expander of all things, in addition to the obvious max-level and max-players). So something is going very wrong here.

Device information

  • Old macbook, but hardward shouldn't matter (?)
  • Client: dense-analysis/ale@ca1da76d5e91e42f676654905c0c7b6d074b8068 (for Vim)
  • Racket: v8.17 CS
  • langserver: a9297eddaa3f4b7689e4d1594bf5a3e44cfaaa9a (+ debugging modifications)

Debugging strategy

I thought it might be my client, so I'd added some tracing prints there. I can continue to do so, but I now suspect that at least something is funky in the server. If the server can be exonerated, I can continue to trace the client and report bugs there, but I don't think my other servers are showing the same problem. Rust, for example, works just fine.

I've also added some tracing prints to the Racket server (compile with raco setup -l racket-langserver afterwards so that things aren't bogged down ;) Here's a diff of what I've added so far (adding the with-handlers + debug print to (send doc-trace get-completions) brought things to a halt even with recompilation, but since the final results are empty it must be empty, too).

patch for debugging code (generated with diff, _not_ git diff)
diff ./doc.rkt /Users/Knoble/Library/Racket/8.17/pkgs/racket-langserver/doc.rkt
400a401,405
> (define-syntax-rule (d! x ...)
>     (with-output-to-file
>      #:exists 'append "/tmp/langserver"
>      (λ () x ...)))
> 
401a407,408
>   (d! (println `(pos ,pos)))
>   (d! (println `(text ,(send (Doc-text doc) get-text))))
diff ./text-document.rkt /Users/Knoble/Library/Racket/8.17/pkgs/racket-langserver/text-document.rkt
251a252,256
>   (define-syntax-rule (d! x ...)
>     (with-output-to-file
>      #:exists 'append "/tmp/langserver"
>      (λ () x ...)))
>   (d! (println params))
262c267,281
<                    (send doc-trace get-online-completions (doc-guess-token doc pos))))
---
>                    (let ([x
>                           (with-handlers ([exn:fail?
>                                           (λ (e)
>                                             (d!
>                                              (parameterize ([current-error-port (current-output-port)])
>                                                ((error-display-handler)
>                                                 (exn-message e) e)))
>                                             (raise e))])
>                             (send doc-trace get-online-completions
>                                   (let ([x (doc-guess-token doc pos)])
>                                     (d! (println `(guessed-token ,x)))
>                                     x)))])
>                      (d! (println `(online ,x)))
>                      x)))
>          (d! (println `(comp ,completions)))

With this code applied, I can tail -f /tmp/langserver in one terminal and trigger completion in another. If there is a better strategy for debugging, please let me know. And please let me know what other information would be helpful to collect.

[^1]: It looks like (define (get-level-info level)\n ( level-table level))\n\n in the string, which… that's not right. I didn't even modify the Vim buffer to look like that.

benknoble avatar Nov 28 '25 17:11 benknoble

I'm not sure how this happened. I can't reproduce on vscode.

I installed ALE on vim with AI's help. Type (le at the end of file, then press <C-x> <C-i>, it pops completion normally.

6cdh avatar Nov 29 '25 10:11 6cdh

I'm going to try a couple of things and will share back results:

  • reproduce on the machine + commits above with a minimal vimrc (with --clean) that I can share more easily
  • reproduce on HEAD Racket (will share commit hash) {on same machine,on newer machine}

Hopefully that will give me more information to work with.

benknoble avatar Dec 02 '25 01:12 benknoble

So far, I've been able to reproduce this on a simpler program (?) with Racket version e8864186e9 (configure: add --enable-sofind=<conv> and /sofind <conv>, 2025-12-01) on the same machine with unmodified langserver (i.e., no custom code from original issue). That does mean I didn't confirm the same server behavior, only that the client was getting no completion results for the couple of things I tried.

I still want to trim my Vim config to something minimal and try across a few machines.

benknoble avatar Dec 04 '25 23:12 benknoble

Ok, I have also reproduced this with a minimal vimrc on the machine where I originally had the program. Specs:

  • Racket commit e8864186e9 (configure: add --enable-sofind=<conv> and /sofind <conv>, 2025-12-01)

  • ALE commit ca1da76d (add markdownlint fixer for markdown (#5066), 2025-11-22)

  • racket/bin/raco pkg show -ld racket-langserver:

    Installation-wide:
     Package              Checksum                                    Source                                                                                   Directory
     racket-langserver    d8c986e8240481d187b81b0ea65a65483fba2ca1    (catalog "racket-langserver" "https://github.com/jeapostrophe/racket-langserver.git")    "/Users/Knoble/code/racket/racket/share/pkgs/racket-langserver"
    User-specific for installation "development":
     [none]
    

Steps:

  • Clone ALE somewhere (e.g., mkdir -p /tmp/pack/test/opt && git clone https://github.com/dense-analysis/ale /tmp/pack/test/opt/ale)

  • Create some file (minimal.vimrc) with following contents:

    " load with vim --clean -u <this-file>
    if &compatible
      set nocompatible
    endif
    filetype on
    
    " Replace with a directory containing pack/<anything>/opt/ale, which contains a
    " clone of https://github.com/dense-analysis/ale.
    " I have /tmp/pack/test/opt/ale/ at commit ca1da76d (add markdownlint fixer for
    " markdown (#5066), 2025-11-22).
    set packpath+=/tmp/
    packadd ale
    
    augroup ale_vimrc
      autocmd User ALELSPStarted setlocal omnifunc=ale#completion#OmniFunc
    augroup END
    
  • Run Vim with the minimal vimrc (and optionally built Racket, prepending PATH=~/code/racket/racket/bin:$PATH or similar to the following command): vim --clean -u minimal.vimrc some.rkt, then insert #lang racket\n\nm and trigger omni-completion (C-x C-o, once ALE loads the LSP anyway; see :ALEInfo to know if it's been started, and check :verbose setlocal omnifunc? to see if the autocommand has run), but no results are provided. If I finish the word (e.g., max) and run :ALEHover, I get the documentation.

benknoble avatar Dec 13 '25 20:12 benknoble

Yes, I can reproduce it.

At first, I thought it was a concurrency bugs. After some debug, I found this bug comes from how ALE send document change event.

After user change the document, the editor sends a document change event. It's either a full document change event or incremental change event. Spec describes here.

In this example, VSCode sends incremental change event. ALE sends full document change event. Here are how we handle two events:

https://github.com/jeapostrophe/racket-langserver/blob/d8c986e8240481d187b81b0ea65a65483fba2ca1/text-document.rkt#L117-L123

For the incremental change event, we update the internal text buffer, and do some tweak for the existing analysis results.

But for the full document change event, we clear the document and all existing analysis results, then insert the new document text as if it's a just opened file.

After each document change event, lsp tries to expand the updated code, and almost all function work on the expanded code.

So there was a moment code looks like this:

#lang racket

Then expand succeed, and lsp save the analysis results.

But then the code becomes

#lang racket

m

Expand would fail because m is undefined. In this case, lsp will reuse the existing analysis results. Because ALE sent full document change, all existing results are removed. So lsp returns empty results. For VSCode, it returns based on the existing results.

6cdh avatar Dec 14 '25 04:12 6cdh

Currently I'm not sure how to make the full document change reuse the existing results. Probably only make completion service keep results after reset. That would make completion not precise, for example, provide not imported or undefined names.

6cdh avatar Dec 14 '25 04:12 6cdh

Maybe we can associate each symbol in results a live counter, each time we rebuild results, we merge old result with increased count, imagine below:

For a racket document

#lang racket
(define a 1)

SERVER create completion logs:

(a , 0)

a document change in, add a new definition b, so SERVER build completion logs again, but also fuse the logs from previous build, hence:

(a , 1)
(a , 0)
(b , 0)

Those entries live too long will be ruled out when fusing (e.g. count = 5).

There will still have ill-symbol, or because live count is short hence normal symbols are lost, but ill-symbol will not exist forever, and normal symbols should eventually be recovered by users.

dannypsnl avatar Dec 14 '25 08:12 dannypsnl

I've bisected the last 1.5years worth of ALE commits, and none of them seem to work. I find that strange, since I'm pretty sure that at some point in that period I had working completion.

benknoble avatar Dec 16 '25 21:12 benknoble

Er, scratch that: apparently nothing back to dcec4b3c (Fix 3998 - add language option to uncrustify fixer (#4007), 2021-12-25) works with my current Racket + langserver, so that's… weird. Has the langserver had any changes to completion in the last few years? I know this was working at some point.

I can try to bisect the langserver at some point, but that's going to take me longer to setup and run.

benknoble avatar Dec 16 '25 21:12 benknoble

Those entries live too long will be ruled out when fusing (e.g. count = 5).

Is it a generation based design? By assigning each item a generation tag, and retire too old items.

But the problem is, a successful expand can replace all existing items. If a expand fails, that means user is typing incrementally, for example, m -> ma -> max. There are no explicit generation boundary.

I find that strange, since I'm pretty sure that at some point in that period I had working completion.

I checked the old code. They didn't reset the existing completion items. It changed during a later refactor. So I think bring back the old behavior is ok.

6cdh avatar Dec 20 '25 01:12 6cdh

I can open a PR if you all agree this or have more suggestions.

6cdh avatar Dec 20 '25 01:12 6cdh

But the problem is, a successful expand can replace all existing items

I don’t understand, can you elaborate this process?

dannypsnl avatar Dec 20 '25 11:12 dannypsnl

Ok, I will explain in details.

When a user types this code

#lang racket

After each character (depending on editor settings), the editor sends a document change event to server. The server updates its own buffer, and attempt to expand the code. The expand might fail. For this code,

#lang racke

The server tries to expand but fails. Then the user types t, the code becomes:

#lang racket

The server tries to expand this, and succeeds.

The completion service runs on the expanded code, return a list of string, means the completion items. Once the server expands successfully, it calls completion service to generate latest version of completion items and discard the old items. But if the expand fails, the completion service cannot run. Then the server have to reuse the old version of completion items which comes from the last successful expand. If the expand never succeed, the completion items is always empty.

Because ALE sends full document change event, the server thinks this document needs a reset, and calls the reset method of all services. For completion service, its reset method clears the saved completion items which comes from the last successful expand. That's why this issue appears.

My plan is to make the reset method of completion service does not clear the saved items.

6cdh avatar Dec 20 '25 13:12 6cdh

I'm not sure if I have totally understood the whole conversation, but I think a hybrid where we only reset saved completion items on a successful expand (and keep them on unsuccessful expands) might work?

It's good to know that we think the issue boils down to a change in the language server! @6cdh if you point me at the refactoring, I should be able to test before/after to see if that's the culprit on my end.

benknoble avatar Dec 20 '25 14:12 benknoble

I probably not properly describe the scenario. But the PR should fix this issue.

6cdh avatar Dec 20 '25 15:12 6cdh

@6cdh With the current architecture, after successful expansion, old completion items are discarded. I agree with that.

However, my suggestion is that using a generational design would allow us to retain them. What I have in mind currently is

  1. If expansion failed, then the old items are retained
  2. If expansion succeeds, push new items and let old items age by one generation
    • If a symbol appears again in a new expansion, just reset its generation instead of creating a duplicate
  3. Whenever an item reaches its life limit (which we can rely on @benknoble tries to ensure a nice value), we remove it from the completion items

Do you mean even if we apply that, still can't solve the problem?

Even if we don't reset, I think we should follow @benknoble's consideration: only skip reset when expansion fails, not skip reset entirely.

dannypsnl avatar Dec 20 '25 16:12 dannypsnl

When expansion succeeds, the code creates a brand-new empty doc-trace object initialized with the latest buffer and expanded code, then discards the old doc-trace. This new trace contains freshly instantiated service objects, and its associated completion service starts in a clean, empty state. New items are generated immediately from this fresh state.

https://github.com/jeapostrophe/racket-langserver/blob/d8c986e8240481d187b81b0ea65a65483fba2ca1/check-syntax.rkt#L150

In summary, the successful expansion case requires no special handling: everything is reinitialized from scratch, producing complete and correct items. There is no need to preserve or merge previous items.

The pseudocode for handling document change events looks like this:

(match document-change
  [(Insert position text)
   (send current-doc insert position text)]
  [(Delete range)
   (send current-doc delete range)]
  [(FullText new-text)
   (send current-doc reset) ; reset all services
   (send current-doc clear-all-text)
   (send current-doc insert 0 new-text)])

(thread
  (λ ()
    (define new-doc-trace (expand-and-run-all-services current-doc))
    (when new-doc-trace
      (send current-doc replace-trace! new-doc-trace))))

Although the reset method is invoked on all services during a full-text update, it has no lasting impact in the successful expansion case. Shortly afterward, a completely new doc-trace—containing entirely fresh service instances—is created and installed, fully replacing the previous one.

6cdh avatar Dec 21 '25 02:12 6cdh