javascript-typescript-langserver
javascript-typescript-langserver copied to clipboard
Stop caching all workspace files on startup when using the local filesystem
Issues
Resolves issues #471, #477 and #463
Description
Previously when using a LocalFileSystem, all files in the workspace with Typescript related extensions were cached in memory on startup. This causes huge startup times in large workspace directories. This PR resolves that by only reading files from disk when they are needed.
There are a few scenario's in which we must still scan through all files in the workspace:
- There are two scenario where all
tsconfig.json
andjsconfig.json
files in the workspace are read. This only happens on the first request. The two scenario's are:- When sending a
workspace/symbol
request - When sending a
workspace/xreferences
request
- When sending a
- There are two scenario's in which all packages in the workspace are read. This only happens on the first request. The two scenario's are:
- When sending a
workspace/symbol
request and using SourceGraph's LSP extension that passes a package name. - When sending a
workspace/xreferences
request and passing adependeePackageName
value.
- When sending a
Testing
On the one hand this is purely a performance change, so no new tests are expected. However, since the Remote and Local file system now work more differently than before, I've made all the tests from typescript-service.test
now also run using local file system.
Future work
-
Avoid scanning all workspace files, at least for all standard LSP requests. Currently that is not the case for the
workspace/symbol
query request. The reason is that this LSP server allows for multiple projects to exist inside of the workspace, so it must scan to workspace to find these projects. However, the LSP protocol does not require this. You may assume the workspace root is the root of a single project. For handling multiple projects with one LSP server, a client can use the recently introduced workspace folders feature. We should change this LSP server so it uses that instead of scanning the root directory. -
Resolve the mismatch betweens Typescript's synchronous
LanguageServiceHost
API and SourceGraph's asynchronous files extension for LSP. The solution is for Typescript's API to become asynchronous.
This definitely fixes the warmup delay in #477
Codecov Report
Merging #472 into master will increase coverage by
0.72%
. The diff coverage is94.4%
.
@@ Coverage Diff @@
## master #472 +/- ##
==========================================
+ Coverage 83.08% 83.81% +0.72%
==========================================
Files 15 16 +1
Lines 2052 2113 +61
Branches 420 488 +68
==========================================
+ Hits 1705 1771 +66
+ Misses 345 342 -3
+ Partials 2 0 -2
Impacted Files | Coverage Δ | |
---|---|---|
src/util.ts | 90% <100%> (+0.22%) |
:arrow_up: |
src/fs.ts | 100% <100%> (+9.58%) |
:arrow_up: |
src/match-files.ts | 81.77% <100%> (ø) |
:arrow_up: |
src/updater.ts | 87.23% <87.23%> (ø) |
|
src/memfs.ts | 93.22% <88.23%> (+2.4%) |
:arrow_up: |
src/project-manager.ts | 87.65% <94.25%> (+0.58%) |
:arrow_up: |
src/packages.ts | 95.83% <94.59%> (+0.65%) |
:arrow_up: |
src/typescript-service.ts | 85.46% <96.15%> (+0.32%) |
:arrow_up: |
... and 1 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update d107424...d2b14f5. Read the comment docs.
The reason is that this LSP server allows for multiple projects to exist inside of the workspace, so it must scan to workspace to find these projects. However, the LSP protocol does not require this. You may assume the workspace root is the root of a single project. For handling multiple projects with one LSP server, a client can use the recently introduced workspace folders feature. We should change this LSP server so it uses that instead of scanning the root directory.
I don't think we can do that - a user doesn't necessarily have each project in its own workspace folder, but may add a monorepo as a single workspace folder. For Sourcegraph, we don't know what "defines" a project in a repo, so every connection passes the repo root as the root URI. Only the langserver has the knowledge to know where to draw language-specific project boundaries - but we have to work with the limitations of LanguageServiceHost
being sync. If a client wants to reduce the number of files scanned, it is free to spawn one language server instance for a subfolder that only contains one project (or use workspaceFolders
if we add support for it).
I have trouble understanding the code architecture after this change. Before I believe the code followed SRP very well: InMemoryFileSystem was the synchronous store that TS would talk to. It was populated by FileSystemUpdater, which synced files to it from any implementation of the async FileSystem interface - either RemoteFileSystem or LocalFileSystem.
I've refactored the original PR. I expect it to be more clear now.
The core of the change is that when using a local files system, file contents are no longer stored in memory. Previously InMemoryFileSystem
did these two things:
- ensure that files in the filesystem can be served synchronously
- overlay the file system with files that aren't saved yet.
This class has been split into two. Responsibility (2) now lies with OverlayFileSystem
. Responsibility (1) lies with the SynchronousFileSystem
interface. The latter has two implementation:
-
LocalFileSystem
. Does not store file contents in memory. Uses synchronous methods fromfs
to serve files. Used in conjunction with theNoopFileSystemUpdater
, which does do any updating since all the files are already present. Note that Microsoft'sLanguageService
back-end already caches all file contents it requests, so performance wise there is no need for theLanguageServiceHost
to do the same. -
InMemoryFileSystem
. Stores files in memory. Used in conjunction withRemoteFileSystemUpdater
which downloads files from aRemoteFileSystem
and stores them in anInMemoryFileSystem
.
An interesting question is what is faster:
- (this PR): intertwine the synchronous reading of relevant files with running the Microsoft back-end
- (master): first read all files in the workspace folder that may be relevant asynchronously and then run the Microsoft back-end
For packages with a directory that contains many files that don't relate to the analysis, which is the case for our package, option (1) works while (2) never completes initializing. However, maybe for packages that only contain relevant files (2) could be faster.
For Sourcegraph, we don't know what "defines" a project in a repo, so every connection passes the repo root as the root URI.
I see, so you definitely need the LSP server to support finding all projects inside the root folder. There could still be a flag though for indicating that the root folder (or workspaceFolders if that feature is used) is/are the roots of the projects.
Actually I just had to revert the last commit in the branch to get a functional build.