javascript-typescript-langserver icon indicating copy to clipboard operation
javascript-typescript-langserver copied to clipboard

Stop caching all workspace files on startup when using the local filesystem

Open keyboardDrummer opened this issue 6 years ago • 5 comments

Issues

Resolves issues #471, #477 and #463

Description

Previously when using a LocalFileSystem, all files in the workspace with Typescript related extensions were cached in memory on startup. This causes huge startup times in large workspace directories. This PR resolves that by only reading files from disk when they are needed.

There are a few scenario's in which we must still scan through all files in the workspace:

  • There are two scenario where all tsconfig.json and jsconfig.json files in the workspace are read. This only happens on the first request. The two scenario's are:
    • When sending a workspace/symbol request
    • When sending a workspace/xreferences request
  • There are two scenario's in which all packages in the workspace are read. This only happens on the first request. The two scenario's are:
    • When sending a workspace/symbol request and using SourceGraph's LSP extension that passes a package name.
    • When sending a workspace/xreferences request and passing a dependeePackageName value.

Testing

On the one hand this is purely a performance change, so no new tests are expected. However, since the Remote and Local file system now work more differently than before, I've made all the tests from typescript-service.test now also run using local file system.

Future work

  • Avoid scanning all workspace files, at least for all standard LSP requests. Currently that is not the case for the workspace/symbol query request. The reason is that this LSP server allows for multiple projects to exist inside of the workspace, so it must scan to workspace to find these projects. However, the LSP protocol does not require this. You may assume the workspace root is the root of a single project. For handling multiple projects with one LSP server, a client can use the recently introduced workspace folders feature. We should change this LSP server so it uses that instead of scanning the root directory.

  • Resolve the mismatch betweens Typescript's synchronous LanguageServiceHost API and SourceGraph's asynchronous files extension for LSP. The solution is for Typescript's API to become asynchronous.

keyboardDrummer avatar Jun 13 '18 15:06 keyboardDrummer

This definitely fixes the warmup delay in #477

Kronuz avatar Jun 20 '18 19:06 Kronuz

Codecov Report

Merging #472 into master will increase coverage by 0.72%. The diff coverage is 94.4%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #472      +/-   ##
==========================================
+ Coverage   83.08%   83.81%   +0.72%     
==========================================
  Files          15       16       +1     
  Lines        2052     2113      +61     
  Branches      420      488      +68     
==========================================
+ Hits         1705     1771      +66     
+ Misses        345      342       -3     
+ Partials        2        0       -2
Impacted Files Coverage Δ
src/util.ts 90% <100%> (+0.22%) :arrow_up:
src/fs.ts 100% <100%> (+9.58%) :arrow_up:
src/match-files.ts 81.77% <100%> (ø) :arrow_up:
src/updater.ts 87.23% <87.23%> (ø)
src/memfs.ts 93.22% <88.23%> (+2.4%) :arrow_up:
src/project-manager.ts 87.65% <94.25%> (+0.58%) :arrow_up:
src/packages.ts 95.83% <94.59%> (+0.65%) :arrow_up:
src/typescript-service.ts 85.46% <96.15%> (+0.32%) :arrow_up:
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d107424...d2b14f5. Read the comment docs.

codecov[bot] avatar Jun 25 '18 08:06 codecov[bot]

The reason is that this LSP server allows for multiple projects to exist inside of the workspace, so it must scan to workspace to find these projects. However, the LSP protocol does not require this. You may assume the workspace root is the root of a single project. For handling multiple projects with one LSP server, a client can use the recently introduced workspace folders feature. We should change this LSP server so it uses that instead of scanning the root directory.

I don't think we can do that - a user doesn't necessarily have each project in its own workspace folder, but may add a monorepo as a single workspace folder. For Sourcegraph, we don't know what "defines" a project in a repo, so every connection passes the repo root as the root URI. Only the langserver has the knowledge to know where to draw language-specific project boundaries - but we have to work with the limitations of LanguageServiceHost being sync. If a client wants to reduce the number of files scanned, it is free to spawn one language server instance for a subfolder that only contains one project (or use workspaceFolders if we add support for it).

felixfbecker avatar Jun 25 '18 20:06 felixfbecker

I have trouble understanding the code architecture after this change. Before I believe the code followed SRP very well: InMemoryFileSystem was the synchronous store that TS would talk to. It was populated by FileSystemUpdater, which synced files to it from any implementation of the async FileSystem interface - either RemoteFileSystem or LocalFileSystem.

I've refactored the original PR. I expect it to be more clear now.

The core of the change is that when using a local files system, file contents are no longer stored in memory. Previously InMemoryFileSystem did these two things:

  1. ensure that files in the filesystem can be served synchronously
  2. overlay the file system with files that aren't saved yet.

This class has been split into two. Responsibility (2) now lies with OverlayFileSystem. Responsibility (1) lies with the SynchronousFileSystem interface. The latter has two implementation:

  • LocalFileSystem. Does not store file contents in memory. Uses synchronous methods from fs to serve files. Used in conjunction with the NoopFileSystemUpdater, which does do any updating since all the files are already present. Note that Microsoft's LanguageService back-end already caches all file contents it requests, so performance wise there is no need for the LanguageServiceHost to do the same.
  • InMemoryFileSystem. Stores files in memory. Used in conjunction with RemoteFileSystemUpdater which downloads files from a RemoteFileSystem and stores them in an InMemoryFileSystem.

An interesting question is what is faster:

  1. (this PR): intertwine the synchronous reading of relevant files with running the Microsoft back-end
  2. (master): first read all files in the workspace folder that may be relevant asynchronously and then run the Microsoft back-end

For packages with a directory that contains many files that don't relate to the analysis, which is the case for our package, option (1) works while (2) never completes initializing. However, maybe for packages that only contain relevant files (2) could be faster.


For Sourcegraph, we don't know what "defines" a project in a repo, so every connection passes the repo root as the root URI.

I see, so you definitely need the LSP server to support finding all projects inside the root folder. There could still be a flag though for indicating that the root folder (or workspaceFolders if that feature is used) is/are the roots of the projects.

keyboardDrummer avatar Jun 28 '18 10:06 keyboardDrummer

Actually I just had to revert the last commit in the branch to get a functional build.

ccope avatar Aug 20 '19 21:08 ccope