dotty-feature-requests icon indicating copy to clipboard operation
dotty-feature-requests copied to clipboard

Support for non-file URIs

Open felixfbecker opened this issue 6 years ago • 6 comments
trafficstars

Hi 👋 I opened a similar discussion at scalameta/metals, but am wondering the same question about the coming native LSP support in dotty.

I peeked around the code a bit, but couldn't whether URIs other than the file: scheme are supported. file: URIs are usually used by editors for on-disk files, but when deploying a language server as a service for online code viewers, web IDEs, code intelligence backend for doc pages with code snippets etc the language server typically runs in an isolated container that does not have the files checked out to disk.

In that scenario, instead of file:// URLs, at Sourcegraph we give the language server http:// URLs that point to endpoints where the text documents can be retrieved with a simple GET request.

Will dotty have the infrastructure to handle URIs other than file:?

felixfbecker avatar Dec 11 '18 19:12 felixfbecker

Hello!

Will dotty have the infrastructure to handle URIs other than file:?

To work properly, our language server needs to at least have access to the content of every source file in the project, so at a minimum we'd need an LSP extension to get that I think, then we can construct a virtual filesystem on our end.

Also, eventually we'd like to integrate our server with BSP to get errors/warnings/.. from the full build pipeline, which would complicate things since now the build tool also has to be aware of these URIs.

the language server typically runs in an isolated container that does not have the files checked out to disk.

What about creating an OS-level virtual file system where file accesses are backed by GET requests (e.g. using FUSE) ? That would avoid having to add support for virtual filesystems in a bunch of different tools.

online code viewers, web IDEs, code intelligence backend for doc pages with code snippets

Some of these usecases overlap with what https://github.com/Microsoft/language-server-protocol/blob/master/indexFormat/specification.md is supposed to solve, what do you think of that approach ?

smarter avatar Dec 12 '18 18:12 smarter

To work properly, our language server needs to at least have access to the content of every source file in the project, so at a minimum we'd need an LSP extension to get that I think, then we can construct a virtual filesystem on our end.

The URLs provided would allow you to access any file in the workspace and resolve paths. For example, the rootUri passed to the language server would be https://sourcegraph.com/github.com/lampepfl/dotty/-/raw/, and it can resolve any file path from that through a relative URL resolve just like you do with file URLs, e.g. the file language-server/src/dotty/tools/languageserver/DottyLanguageServer.scala is at https://sourcegraph.com/github.com/lampepfl/dotty/-/raw/language-server/src/dotty/tools/languageserver/DottyLanguageServer.scala. Does that work?

What about creating an OS-level virtual file system where file accesses are backed by GET requests (e.g. using FUSE) ? That would avoid having to add support for virtual filesystems in a bunch of different tools.

That's one possibility we haven't explored yet, but a benefit of using URIs is that these URIs can be authenticated (you can include a token in the auth section of the URL) so the language server container doesn't need to be worried about authentication. Not sure how that would work with FUSE, but I think it would be more complicated.

Some of these usecases overlap with what Microsoft/language-server-protocol:indexFormat/specification.md@master is supposed to solve, what do you think of that approach ?

The LSIF is very interesting, but it's designed for caching read-only workspaces. In the scenario of an online IDE files are changing and LSIF can't help, you still need to run an actual language server as a service, sync state with it, have it request things etc. In addition, LSIF is an index format generated by a language server, so you also need to run the language server to generate it in the first place (so you can then use it as a cache later).

felixfbecker avatar Dec 12 '18 18:12 felixfbecker

Does that work?

Ah I see, yes I think that with some effort we could get that to work, although I'm not sure what performance would be like. It's a bit hard to do this in the abstract though, do you have an LSP client that works over http available somewhere ?

That's one possibility we haven't explored yet, but a benefit of using URIs is that these URIs can be authenticated (you can include a token in the auth section of the URL)

I don't know the specifics, but I believe you could do that with FUSE too, e.g. you'd mount the filesystem by providing a base URI, including auth section, and the name of the directory where the fs is mounted, then any access to a/b in this directory would be equivalent to GET baseurl/a/b/.

Now that I'm thinking about it there might be a simpler solution that gets us most of the way there: at the beginning of an editing session, could the server get a zip of the full working directory from the client (e.g. by doing a GET on https://sourcegraph.com/github.com/lampepfl/dotty/workspace.zip ) ? This would be much faster than doing a ton of back-and-forth through HTTP requests, and would give us a base state from which we can run the build tool to extract information, run the compiler to get an initial state of cached information, etc. Starting from that, everything else would then behave as if development was done locally but files were never saved to disk.

smarter avatar Dec 12 '18 20:12 smarter

Ah I see, yes I think that with some effort we could get that to work, although I'm not sure what performance would be like.

If the language server and the file-serving backend are deployed on the same node then HTTP latency is not really an issue. Ultimately, the files exist in some git repo, so they have to be fetched over HTTP anyway one way or the other, the differences are only whether everything is fetched up-front together in one request, or lazily (and parallelised) with multiple requests as-needed.

It's a bit hard to do this in the abstract though, do you have an LSP client that works over http available somewhere ?

This is a Sourcegraph extension that connects to a language server through WebSockets and passes it HTTP URLs as rootUri and for requests on text documents, if that helps: https://github.com/sourcegraph/lang-typescript/blob/master/extension/src/extension.ts This way it can provide hover tooltips, go to definition etc on sourcegraph.com and on GitHub (including when reviewing PRs, through the Sourcegraph Chrome extension).

I don't know the specifics, but I believe you could do that with FUSE too, e.g. you'd mount the filesystem by providing a base URI, including auth section, and the name of the directory where the fs is mounted, then any access to a/b in this directory would be equivalent to GET baseurl/a/b/.

I see. It still sounds more complex to deploy - a language server that works with HTTP URLs can be deployed on any Linux box (or Heroku) by just running a single isolated process, which is nice. But it may be something worth looking into.

Now that I'm thinking about it there might be a simpler solution that gets us most of the way there: at the beginning of an editing session, could the server get a zip of the full working directory from the client (e.g. by doing a GET on sourcegraph.com/github.com/lampepfl/dotty/workspace.zip ) ? This would be much faster than doing a ton of back-and-forth through HTTP requests, and would give us a base state from which we can run the build tool to extract information, run the compiler to get an initial state of cached information, etc. Starting from that, everything else would then behave as if development was done locally but files were never saved to disk.

Yes, that is actually already possible and how we make it work for language servers that expect files on disk (e.g. because they shell into tools). All you need to do is send the Accept: application/zip (or Accept: application/x-tar) header to the URL of any directory and it returns an archive instead of the content listing. If that is faster than individual requests dotty can totally utilise this capability. What to be aware of though is that this is usually slower when the repository is a big monorepo, as it fetches the entire monorepo with lots of not-needed files (assuming always the root is requested). Are you imagining that archive would be read into memory, or extracted to disk? If it's extracted, disk performance is also important to consider.

felixfbecker avatar Dec 12 '18 20:12 felixfbecker

What to be aware of though is that this is usually slower when the repository is a big monorepo, as it fetches the entire monorepo with lots of not-needed files

I can see how it could be slower for repos that contain some big binary artifacts indeed, but I think that on average it would be faster (it's also hard to know in advance which files are needed or not-needed to run the build tool, e.g. with sbt you can add files to the build at arbitrary locations in your project).

Are you imagining that archive would be read into memory, or extracted to disk ?

I would extract it to disk, since that's the only way I could get build tools to work (without investing a ton of effort into getting them to support virtual filesystems)

smarter avatar Dec 12 '18 20:12 smarter

Also when I say "extract it to disk" it could be an in-memory filesystem to avoid disk performance issues (tmpfs is built into Linux and does the job well)

smarter avatar Dec 12 '18 20:12 smarter