languageserver icon indicating copy to clipboard operation
languageserver copied to clipboard

Configurable source dirs for file parsing

Open MilesMcBain opened this issue 4 years ago • 15 comments

It is a well known issue that the languageserver only parses files other than those that it is notified are opened in the case that it detects the workspace root is a package. https://github.com/REditorSupport/languageserver/issues/253

This means that the way that workspace symbols and definitions work are not how many people expect and seem to deviate a little from the protocol specification: https://microsoft.github.io/language-server-protocol/specification#workspace_symbol.

There is also a related issue that, even in open files, source() is not understood as something that affects the workspace namespace: https://github.com/REditorSupport/vscode-r-lsp/issues/65

I believe a pretty cheap solution to this is to allow users to configure directories that the server should parse E.g. options(languageserver.r.source.dirs = c("R", "scripts") would cause workspace$load_all() parse R files in those two top level folders, even if it cannot identify the project as a package.

Another possibly nicer alternative is for the option to work as an exclusion list, so parsing starts at "." and any paths containing folders with names defined in the option are ignored. But there seems to a lot of inertia behind the "no parse by default" position.

The implementation would be pretty cheap:

in:

on_initialized <- function(self, params) {
    logger$info("on_initialized")
    project_root <- self$rootPath
    if (length(project_root) && is_package(project_root)) {
        # a bit like devtools::load_all()
        self$workspace$load_all(self)
        # TODO: result lint result of the package
        # lint_result <- lintr::lint_package(rootPath)
    }

workspace$load_all could take a vector of paths to parse.

E.g. workspace$load_all(self, source_paths)

So then we'd just need to build the vector of paths which is whatever languageserver.r.source.dirs says unioned with "R" in the case that the project is a package.

source_paths <- union(
  getOption("languageserver.r.source.dirs", NULL),
  if (is_package(project_root)) "R" else NULL)
) 

The change to workspace$load_all is minimal too, since file.path and list.files are already vectorised:

        load_all = function(langserver, source_paths) {
            source_dirs <- file.path(self$root, source_paths)
            files <- list.files(source_dirs, pattern = "\\.r$", ignore.case = TRUE)
            for (f in files) {
                logger$info("load ", f)
                path <- file.path(source_dir, f)
                uri <- path_to_uri(path)
                doc <- Document$new(uri, NULL, stringi::stri_read_lines(path))
                self$documents$set(uri, doc)
                # TODO: move text_sync to Workspace!?
                langserver$text_sync(uri, document = doc, parse = TRUE)
            }
            self$import_from_namespace_file()
        },

I'd also argue for recursive = TRUE in list.files.

MilesMcBain avatar Mar 09 '21 05:03 MilesMcBain

I think the idea of using an option to control the default sources makes good sense, given that I was trying to implement https://github.com/REditorSupport/languageserver/pull/382 to handle source() calls but soon find it much trickier than expected in the following perspectives:

  1. source() could be recursive. Handling source() by opening and closing documents in workspace requires that a dependency graph be well maintained, being aware of whether each document is open or not open but sourced by other open documents. Handling this on document change could be expensive, only handle source graph on document open and close might make more sense.
  2. If the source graph is maintained, then it might make more sense to provide completion and other features based on the document and all sources it depends on. In this way, we don't need to open and close documents in the workspace to control the scope of language providers.

With these thinking, handling source() in an sensible way is not easy. An option to control default sources could solve much of the problem.

renkun-ken avatar Mar 11 '21 03:03 renkun-ken

Great!

What do you think about having an exclusion list in addition to the source dirs list? It would allow something like:

options(languageserver.r.source.dirs = ".", languageserver.excluded.source.dirs = c("renv", ".drake")) etc.

A bit of fancy footwork somewhere to avoid parsing R folder twice in some cases, possibly just by making files to be parsed unique().

MilesMcBain avatar Mar 11 '21 03:03 MilesMcBain

I guess an important use case is to specify a regex or glob pattern to choose a collection of files, both could be handled by fs::dir_ls().

renkun-ken avatar Mar 11 '21 03:03 renkun-ken

Also, I find Sys.glob() useful to specify a glob pattern to specify a collection of files:

> Sys.glob("*.R")                                                                          
character(0)

> Sys.glob("**/*.R")                                                                       
 [1] "R/call_hierarchy.R"        "R/capabilities.R"          "R/color.R"                
 [4] "R/completion.R"            "R/definition.R"            "R/diagnostics.R"          
 [7] "R/document.R"              "R/folding.R"               "R/formatting.R"           
[10] "R/handlers-general.R"      "R/handlers-langfeatures.R" "R/handlers-textsync.R"    
[13] "R/handlers-workspace.R"    "R/highlight.R"             "R/hover.R"                
[16] "R/interfaces.R"            "R/languagebase.R"          "R/languageclient.R"       
[19] "R/languageserver.R"        "R/link.R"                  "R/log.R"                  
[22] "R/namespace.R"             "R/protocol.R"              "R/references.R"           
[25] "R/rename.R"                "R/selection.R"             "R/session.R"              
[28] "R/signature.R"             "R/symbol.R"                "R/task.R"                 
[31] "R/utils.R"                 "R/workspace.R"             "man-roxygen/document.R"   
[34] "man-roxygen/id.R"          "man-roxygen/position.R"    "man-roxygen/self.R"       
[37] "man-roxygen/tdpp.R"        "man-roxygen/uri.R"         "man-roxygen/workspace.R"  
[40] "tests/testthat.R"         

> Sys.glob("R/*.R")                                                                        
 [1] "R/call_hierarchy.R"        "R/capabilities.R"          "R/color.R"                
 [4] "R/completion.R"            "R/definition.R"            "R/diagnostics.R"          
 [7] "R/document.R"              "R/folding.R"               "R/formatting.R"           
[10] "R/handlers-general.R"      "R/handlers-langfeatures.R" "R/handlers-textsync.R"    
[13] "R/handlers-workspace.R"    "R/highlight.R"             "R/hover.R"                
[16] "R/interfaces.R"            "R/languagebase.R"          "R/languageclient.R"       
[19] "R/languageserver.R"        "R/link.R"                  "R/log.R"                  
[22] "R/namespace.R"             "R/protocol.R"              "R/references.R"           
[25] "R/rename.R"                "R/selection.R"             "R/session.R"              
[28] "R/signature.R"             "R/symbol.R"                "R/task.R"                 
[31] "R/utils.R"                 "R/workspace.R"            

> Sys.glob(c("R/*.R", "tests/**/*.R"))                                                     
 [1] "R/call_hierarchy.R"                    "R/capabilities.R"                     
 [3] "R/color.R"                             "R/completion.R"                       
 [5] "R/definition.R"                        "R/diagnostics.R"                      
 [7] "R/document.R"                          "R/folding.R"                          
 [9] "R/formatting.R"                        "R/handlers-general.R"                 
[11] "R/handlers-langfeatures.R"             "R/handlers-textsync.R"                
[13] "R/handlers-workspace.R"                "R/highlight.R"                        
[15] "R/hover.R"                             "R/interfaces.R"                       
[17] "R/languagebase.R"                      "R/languageclient.R"                   
[19] "R/languageserver.R"                    "R/link.R"                             
[21] "R/log.R"                               "R/namespace.R"                        
[23] "R/protocol.R"                          "R/references.R"                       
[25] "R/rename.R"                            "R/selection.R"                        
[27] "R/session.R"                           "R/signature.R"                        
[29] "R/symbol.R"                            "R/task.R"                             
[31] "R/utils.R"                             "R/workspace.R"                        
[33] "tests/testthat/helper-utils.R"         "tests/testthat/test-call-hierarchy.R" 
[35] "tests/testthat/test-codeunits.R"       "tests/testthat/test-color.R"          
[37] "tests/testthat/test-completion.R"      "tests/testthat/test-definition.R"     
[39] "tests/testthat/test-folding.R"         "tests/testthat/test-formatting.R"     
[41] "tests/testthat/test-highlight.R"       "tests/testthat/test-hover.R"          
[43] "tests/testthat/test-langauagecilent.R" "tests/testthat/test-link.R"           
[45] "tests/testthat/test-lintr.R"           "tests/testthat/test-null-root.R"      
[47] "tests/testthat/test-references.R"      "tests/testthat/test-rename.R"         
[49] "tests/testthat/test-search.R"          "tests/testthat/test-selection.R"      
[51] "tests/testthat/test-signature.R"       "tests/testthat/test-stdio.R"          
[53] "tests/testthat/test-symbol.R"          "tests/testthat/test-tcp.R"            
[55] "tests/testthat/test-unicode-path.R"

> Sys.glob("**/language*.R")                                                               
[1] "R/languagebase.R"   "R/languageclient.R" "R/languageserver.R"

renkun-ken avatar Mar 11 '21 14:03 renkun-ken

Okay I quite like this. So the options could be globs? I'm going to have a go at an implementation and see how this feels.

MilesMcBain avatar Mar 13 '21 10:03 MilesMcBain

Not quite ready for a PR, but it seems to be working well: https://github.com/MilesMcBain/languageserver/tree/config-src-globs

MilesMcBain avatar Mar 15 '21 06:03 MilesMcBain

Please feel free to send a PR anytime.

renkun-ken avatar Mar 15 '21 06:03 renkun-ken

@MilesMcBain Do you intend to finish this PR? If not, what were the issues that you ran into?

sebffischer avatar Dec 27 '21 10:12 sebffischer

The main issue I had was writing tests for the code. The change itself wasn't too hard although code mat have drifted now.

MilesMcBain avatar Dec 27 '21 10:12 MilesMcBain

@sebffischer to answer the first part of the question for you, I'm not currently working on it and have no plans to pick it up.

MilesMcBain avatar Dec 27 '21 10:12 MilesMcBain

@randy3k & @renkun-ken , Just wanted to say it would be amazing if someone could pick this up again :( This limitation in particular causes pain when writing large packages with significant testthat/ code in vscode as it means you can't reliably depend on the reference finding when re-factoring code + tests.

Even if the full feature isn't implemented it would be great to have tests/ added as a default directory that is scanned alongside the R/ directory.

(obligatury thank you guys for your work, this package makes a huge difference to doing R based development)

gowerc avatar Feb 03 '22 09:02 gowerc

I'd like to summarize what we need to consider for this issue:

  • Where should we configure this? options(), a standalone config file (e.g. .languageserver.yml), or an existing config file <project>.Rproj?
  • How should we configure this? Through a vector of regex or glob patterns? What about exclude?
  • When should we resolve this configuration? The regex or glob patterns should be resolved not only on language server startup, but also on each workspace file changes via workspace/didChangeWatchedFiles handled by workspace_did_change_watched_files, which includes creating/renaming/removing a file or directory in the workspace folder.

renkun-ken avatar Feb 03 '22 10:02 renkun-ken

I mean my personal preferance would be to go down the route similar to .gitignore where you just have a flat text file listing paths and use a ! prefix to indicate "not" i.e.

**/*.R
!R/xxx_*.R

gowerc avatar Feb 03 '22 10:02 gowerc

I'm digging up this issue because that's a feature that most (if not all) people of our research team want/need. The way our projects are organized makes it very difficult to use langagueserver in a useful way. I wrote an ugly fix for myself sometimes ago (pretty much recursively loading all R files in the workspace) but since several colleagues asked me for my hacky version of the package, I decided to clean it and make it publicly available.

I made the following choices for the implementation:

  • The user can add a list of folder to include through a new option languageserver.source_dirs which should be a list of folder's path
  • If languageserver.source_dirs = list() the previous behavior is conserved (the language serrver tries to detect if the workspace is a package)
  • For now, it is not possible to use globs
  • Internally, the server keeps a list of the files that are tracked (and not a list of the folders) so it would be easy to add the option for globs (through another option?)

I do have a working version in my fork at https://github.com/celbig/languageserver but no tests for now. If you are interested, I'm willing to do a proper PR with your suggestions and requirements. Considering tests, I'm not sure of what to test but I'm willing to implement whatever you find necessary.

I hope this can help enhancing this package since this feature seems to be requested regularly

celbig avatar Jun 20 '23 12:06 celbig