jupyterlab-lsp
jupyterlab-lsp copied to clipboard
Tcp mode
References
I could not find any explicit issue that would be solved by this PR. However, the requirement for TCP support has been mentioned in #282 and #184 (The letter also contains some comments regarding design decisions)
Code changes
- As discussed in #184 this PR firstly provides the switch from AsyncIO to AnyIO for the
LanguageServerSession
class and theLspStreamBase
,LspStreamReader
andLspStreamWriter
classes (formerlyLspStdIoBase
,LspStdIoReader
andLspStdIoWriter
) - Second it does provide the option to connect to TCP Language Servers (tested with Wolfram-Language-Server, I'm happy to also provide my specification for it in another PR).
- As there are now different ways to connect to a LS, the specification given in
python_packages/jupyter_lsp/jupyter_lsp/schema/schema.json
was extended by amode
property which may take on valuesstdio
ortcp
,stdio
being the default if nomode
is specified, so as to not break existing language server specifications. Ifmode==tcp
, a LS is started in a new process onlocalhost
and on a randomly selected free port. The port is controlled by replacing each occurrence of the placeholder{port}
within theargv
of the server'sspec
with that port.
Backwards-incompatible changes
There should be no backward-incompatible changes as the default values for the newly introduced mode
property is stdio
and thus no changes for the (currently solely used) language servers communicating over stdio is required.
TODOS:
- [x] Adopt the documentation regarding extension of jupyterlab-lsp by new language servers (-> now also possible to use tcp servers, explain new configuration parameters)
- [x] Split
Session
intoStdioSession
andTCPSession
Chores
- [x] linted
- [x] tested
- [x] documented
- [x] changelog entry
Thank you so much for this! :heart_eyes: Despite all the :x:, this is a really good step forward.
I guess I'd like to see Session
split into two implementations so there is just the one if mode == "tcp"
check at the manager level. It is probably reasonable to either:
- have the (mostly) battle-tested one be the base implementation, and have the new TCP one subclass it
- make a new base class with a few easy-to-grok inner implementations in
TCPSession
andStdioSession
I wouldn't go too far overboard with abc
's etc.
I'm only a casual user of anyio 2
(despite being tangentially to blame for its introduction into the jupyter stack), so i'll have to do a fair amount study to be able to help evaluate the specifics of this fully, but it does look like a good step forward. I'd really love we if we could get rid of explicit executors, and/or replace our gross, cargo-culted non-blocking win32 code with anyio constructs, but that is a whole separate ball of wax.
It looks like we will be able to test this fairly robustly with python-lsp
, which supports --tcp
, and indeed would need to crawl back up the coverage scale to feel good about it.
Giving it a pass through jlpm lint
and python scripts/lint.py
would also help clean things up a hair.
Solider on, and let us know if you get hard stuck anywhere!
Thank you so much for taking the time to review my changes and your input! I hope I will find the time in the close future to fix everything.
Regarding anyio: I am not overly proud of the way I managed to get this working with anyio, especially the intersection where non async code (but still running within an existing event loop from jupyter lab) needs to call async code seems much to complicated for my taste. Even more so since the anyio change where start_blocking_portal
now required a context manager which does not seem to play overly well with the object oriented world. So if you should find any better way to handle things, please let me know.
I will also try and split the Session class in two.
While trying to get things running for externally running lsp-servers, I came across following problem (or so I think):
It seems that the current implementation never asks the server to shut down properly, i.e. no sequence shutdown
→ Server, ack
→Client, exit
→ Server, seems to take place. I presume that was because each session was creating its own process running the required LS anyway, and after the work was done that process was just terminated. However, now that we would potentially like to support asking externally running servers for completions, we would have to close the connection more gracefully with accordance to the language server protocol, no?
So my question is: Am I right about this part of the protocol currently not being implemented? If so, I presume one would have to extend the ClientRequest
and ClientNotification
enums in connection.ts
. But I have no clue as to how I would proceed from there as I have zero knowledge of javascript programming. Do you have some pointers how I would go from there?
So my question is: Am I right about this part of the protocol currently not being implemented?
Yes. I am not sure when we want to ask servers to close, but at the minimum we need to do so at JupyterLab shutdown, which is also the safest option (but not trivial to detect). It is tricky because user closing the window does not mean that the session ends - they may be just switching to another browser or refreshing the window. So we would probably need to have a hook in the python code using atexit
. Shutting the language server down when all documents for given language are closed is an option, but this requires investigation of how that would work for multi-user deployments.
Having said that, there is also a "shut down" option in JupyterLab's "File" menu, though I never use it (I always exit by terminating the server and closing the window) as I think many users do, hence the remark about the need of having atexit
hook.
If so, I presume one would have to extend the ClientRequest and ClientNotification enums in connection.ts
Yes, if we want to emit those from within the frontend (which as described above is complicated and could only reliably happen if and only if the "Shut down" button was pressed in JupyterLab, as any other server disconnect might indicate a temporary network failure, etc...). The next step would be to update IClientNotifyParams
and IClientRequestParams
, but again I am not convinced whether we want to have this handled by the frontend, or rather by the python extension instead.
We also need to investigate how this would interact with multi-user deployments such as JupyterHub and Real Time Collaboration. In short I do not think that we should implement this in this PR because there are too many unknowns; you are right that we should look into it and thank you for starting this discussion. I think we should create a new issue out of this to allow others to chime in with ideas before deciding on implementation.
Edit: there is a stop_extension
hook in works in jupyter_server
and we should probably use that over atexit
: https://github.com/jupyter-server/jupyter_server/pull/526
All right, thanks for clarifying. I will focus on implementing the TCP support for self launched processes for now then and open a separate issue to discuss on how to proceed from this stage, when this PR is done.
Picked up conflicts with schema. Sadly GitHub Actions do not kick in until conflicts are resolved. I also added a note about stop_extension
hook in my previous comment.
Picked up conflicts with schema. Sadly GitHub Actions do not kick in until conflicts are resolved. I also added a note about
stop_extension
hook in my previous comment.
All right, so I can just pull master and merge it into the tcp_mode branch, thereby resolving the conflict, and then push again, correct?
pull master and merge it into the tcp_mode branch,
or do it through the web ui if you're feeling brave...
Binder gives you a virtual network, and it acts a little funny. In CI/binder/hub, they run even funnier, and as a user of such a service, one has even less control of them. We've had to do a fair amount of upstream work to ensure that test suites (e.g. pygls) can be run as-is... Usually the localhost/127 is all that's needed, but sometimes hard coded ports also need some techniques e.g. environment variables, random port discovery.
So yeah, outside of tests, anything that takes an address should at least accept overriding the host/port, and pick robust, loudly announced default behavior if unspecified.
Thank you for good questions! I will aim to give some answers during the weekend.
I wanted to explain why I cancelled the CI runs: there was a new JupyterLab (3.1) release which required some changes to the Robot tests on CI and those were going to fail slowly taking CI time, so I needed to stop them. It is all fixed now so once you merge master or rebase the CI here should work fine.
Sorry for the long silence, I had a deadline at work which was demanding my attention. But now I have more time and will try to get this ready :)
No worries and thank you for persevering! You may need to rebase to pick up a new CI dependencies pin which prevents CI from failing.
I'm still struggling to get the stop procedure to work on windows. According to the anyio docs Process.kill
is an alias of Process.terminate
on Windows, which in turn calls the Win32 API function TerminateProcess()
. However, the latter, opposed to the behavior of Process.kill
on Posix systems, seems to be asynchronous, i.e., it will return immediately and does not ensure that the process is dead before returning. Does somebody know how I could achieve the same behavior on Windows?
If that indeed is not possible, how should I handle this on Windows?
Should I just raise a warning in session.stop_process
if the OS is Windows instead of the call to Process.kill
on Posix systems?
How am I supposed to test this? The current test works on Posix, so I presume that is fine. Should I just give Windows some extra time (not sure how much more time Windows will require, as I don't have a Windows machine to test this locally, maybe 10 seconds for a start) and check afterwards if the process has gone away? This would than at least test that the process will die at some point, if not at the time that we force it to on other systems.
If indeed it takes that long for Windows to shut down the process, should the timeout
parameter of session.stop_process
be OS dependent, to keep it short on Posix systems.
Sorry for the myriad of questions. I'm thankful for any advice :)
I'm still struggling to get the stop procedure to work on windows.
If no one has the ability to test/support Windows properly we can merge the best solutions you find appropriate and release this saying that Windows support is provisional and "help wanted".
Should I just give Windows some extra time (not sure how much more time Windows will require, as I don't have a Windows machine to test this locally, maybe 10 seconds for a start) and check afterwards if the process has gone away?
Sounds good to me.
If indeed it takes that long for Windows to shut down the process, should the timeout parameter of session.stop_process be OS dependent, to keep it short on Posix systems.
yes
I think the code should be ready for another review round (I don't think that the smoke installation error was caused by the made changes).
To fully take advantage of the structured concurrency offered by anyio, I think more work needs to be done from the jupyterlab extension API side, as the current implementation using Tornado Websockets, while supporting coroutines for open()
and on_message()
, does not seem to be made for structured concurrency. E.g., the doc of open()
states that no messages will be processed until open()
has returned, thus preventing any structured concurrency happening within without using another event loop in a separate thread.
Looking at the issue of freezes everywhere except for Ubuntu py3.10 on; there is no sane way to debug it on CI/with pytest, but locally I can reproduce it. Starting lab I see:
Exception while listening Cannot add child handler, the child watcher does not have a loop attached
Traceback (most recent call last):
File "jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/session.py", line 162, in run
await self.initialize()
File "jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/session.py", line 180, in initialize
await self.init_process()
File "jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/session.py", line 328, in init_process
await self.start_process(self.spec["argv"])
File "jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/session.py", line 230, in start_process
env=self.substitute_env(self.spec.get("env", {}), os.environ),
File ".pyenv/versions/3.7.15/envs/lsp-lab3.x-py3.7/lib/python3.7/site-packages/anyio/_core/_subprocesses.py", line 135, in open_process
start_new_session=start_new_session,
File ".pyenv/versions/3.7.15/envs/lsp-lab3.x-py3.7/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 1112, in open_process
start_new_session=start_new_session,
File ".pyenv/versions/3.7.15/lib/python3.7/asyncio/subprocess.py", line 217, in create_subprocess_exec
stderr=stderr, **kwds)
File ".pyenv/versions/3.7.15/lib/python3.7/asyncio/base_events.py", line 1544, in subprocess_exec
bufsize, **kwargs)
File ".pyenv/versions/3.7.15/lib/python3.7/asyncio/unix_events.py", line 193, in _make_subprocess_transport
self._child_watcher_callback, transp)
File ".pyenv/versions/3.7.15/lib/python3.7/asyncio/unix_events.py", line 941, in add_child_handler
"Cannot add child handler, "
RuntimeError: Cannot add child handler, the child watcher does not have a loop attached
Coming from self.process = await anyio.open_process()
call.
This appears to be https://bugs.python.org/issue35621 which was only fixed in Python 3.8 as the workaround from https://bugs.python.org/msg370355:
import asyncio
from .threaded_child_watcher import ThreadedChildWatcher # where ThreadedChildWatcher was copied from CPython
asyncio.set_child_watcher(ThreadedChildWatcher())
does work. This is for Unix only, no idea how to fix this on Windows, and the workaround may have side-effects on tornado.
Switching to trio backend gives us:
RuntimeError: There is no current event loop in thread 'Thread-2'.
I guess the question becomes: do we really need/want start_process
to by async
?
So things are getting interesting with https://github.com/krassowski/jupyterlab-lsp/pull/4/commits/12bfbd42b6085f290affd1f6375049d5bc27a229:
- Windows py3.10 works on CI (which gives me hope) but throws
ValueError: I/O operation on closed pipe
from time to time in tests which is a known problem and apparently can be worked around by calling privateprocess._transport.close()
. Probably worth finding a bpo or filing an issue with cPython. - Ubuntu py3.7 starts up locally but hangs when running tests (both locally and on CI). When manually aborting the tests we can see that it raises:
Traceback (most recent call last):
File "/3.7.15/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/3.7.15/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/lsp-lab3.x-py3.7/lib/python3.7/site-packages/anyio/_core/_eventloop.py", line 70, in run
return asynclib.run(func, *args, **backend_options)
File "/lsp-lab3.x-py3.7/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 292, in run
return native_run(wrapper(), debug=debug)
File "/3.7.15/lib/python3.7/asyncio/runners.py", line 43, in run
return loop.run_until_complete(main)
File "/3.7.15/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
return future.result()
File "/lsp-lab3.x-py3.7/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
return await func(*args)
File "jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/session.py", line 179, in run
await self.cleanup()
File "jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/session.py", line 205, in cleanup
await self.stop_process(self.stop_timeout)
File "jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/session.py", line 254, in stop_process
self.process.terminate()
File "/lsp-lab3.x-py3.7/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 1053, in terminate
self._process.terminate()
File "/3.7.15/lib/python3.7/asyncio/subprocess.py", line 131, in terminate
self._transport.terminate()
File "/3.7.15/lib/python3.7/asyncio/base_subprocess.py", line 150, in terminate
self._check_proc()
File "/3.7.15/lib/python3.7/asyncio/base_subprocess.py", line 143, in _check_proc
raise ProcessLookupError()
ProcessLookupError