jupyterlab-lsp icon indicating copy to clipboard operation
jupyterlab-lsp copied to clipboard

Tcp mode

Open FlyingSamson opened this issue 3 years ago • 19 comments

References

I could not find any explicit issue that would be solved by this PR. However, the requirement for TCP support has been mentioned in #282 and #184 (The letter also contains some comments regarding design decisions)

Code changes

  • As discussed in #184 this PR firstly provides the switch from AsyncIO to AnyIO for the LanguageServerSession class and the LspStreamBase, LspStreamReader and LspStreamWriter classes (formerly LspStdIoBase, LspStdIoReader and LspStdIoWriter)
  • Second it does provide the option to connect to TCP Language Servers (tested with Wolfram-Language-Server, I'm happy to also provide my specification for it in another PR).
  • As there are now different ways to connect to a LS, the specification given in python_packages/jupyter_lsp/jupyter_lsp/schema/schema.json was extended by a mode property which may take on values stdio or tcp, stdio being the default if no mode is specified, so as to not break existing language server specifications. If mode==tcp, a LS is started in a new process on localhost and on a randomly selected free port. The port is controlled by replacing each occurrence of the placeholder {port} within the argv of the server's spec with that port.

Backwards-incompatible changes

There should be no backward-incompatible changes as the default values for the newly introduced mode property is stdio and thus no changes for the (currently solely used) language servers communicating over stdio is required.

TODOS:

  • [x] Adopt the documentation regarding extension of jupyterlab-lsp by new language servers (-> now also possible to use tcp servers, explain new configuration parameters)
  • [x] Split Session into StdioSession and TCPSession

Chores

  • [x] linted
  • [x] tested
  • [x] documented
  • [x] changelog entry

FlyingSamson avatar Jun 29 '21 17:06 FlyingSamson

Thank you so much for this! :heart_eyes: Despite all the :x:, this is a really good step forward.

I guess I'd like to see Session split into two implementations so there is just the one if mode == "tcp" check at the manager level. It is probably reasonable to either:

  • have the (mostly) battle-tested one be the base implementation, and have the new TCP one subclass it
  • make a new base class with a few easy-to-grok inner implementations in TCPSession and StdioSession

I wouldn't go too far overboard with abc's etc.

I'm only a casual user of anyio 2 (despite being tangentially to blame for its introduction into the jupyter stack), so i'll have to do a fair amount study to be able to help evaluate the specifics of this fully, but it does look like a good step forward. I'd really love we if we could get rid of explicit executors, and/or replace our gross, cargo-culted non-blocking win32 code with anyio constructs, but that is a whole separate ball of wax.

It looks like we will be able to test this fairly robustly with python-lsp, which supports --tcp, and indeed would need to crawl back up the coverage scale to feel good about it.

Giving it a pass through jlpm lint and python scripts/lint.py would also help clean things up a hair.

Solider on, and let us know if you get hard stuck anywhere!

bollwyvl avatar Jun 30 '21 01:06 bollwyvl

Thank you so much for taking the time to review my changes and your input! I hope I will find the time in the close future to fix everything.

Regarding anyio: I am not overly proud of the way I managed to get this working with anyio, especially the intersection where non async code (but still running within an existing event loop from jupyter lab) needs to call async code seems much to complicated for my taste. Even more so since the anyio change where start_blocking_portal now required a context manager which does not seem to play overly well with the object oriented world. So if you should find any better way to handle things, please let me know.

I will also try and split the Session class in two.

FlyingSamson avatar Jun 30 '21 17:06 FlyingSamson

While trying to get things running for externally running lsp-servers, I came across following problem (or so I think):

It seems that the current implementation never asks the server to shut down properly, i.e. no sequence shutdown→ Server, ack→Client, exit→ Server, seems to take place. I presume that was because each session was creating its own process running the required LS anyway, and after the work was done that process was just terminated. However, now that we would potentially like to support asking externally running servers for completions, we would have to close the connection more gracefully with accordance to the language server protocol, no?

So my question is: Am I right about this part of the protocol currently not being implemented? If so, I presume one would have to extend the ClientRequest and ClientNotification enums in connection.ts. But I have no clue as to how I would proceed from there as I have zero knowledge of javascript programming. Do you have some pointers how I would go from there?

FlyingSamson avatar Jul 03 '21 16:07 FlyingSamson

So my question is: Am I right about this part of the protocol currently not being implemented?

Yes. I am not sure when we want to ask servers to close, but at the minimum we need to do so at JupyterLab shutdown, which is also the safest option (but not trivial to detect). It is tricky because user closing the window does not mean that the session ends - they may be just switching to another browser or refreshing the window. So we would probably need to have a hook in the python code using atexit. Shutting the language server down when all documents for given language are closed is an option, but this requires investigation of how that would work for multi-user deployments.

Having said that, there is also a "shut down" option in JupyterLab's "File" menu, though I never use it (I always exit by terminating the server and closing the window) as I think many users do, hence the remark about the need of having atexit hook.

If so, I presume one would have to extend the ClientRequest and ClientNotification enums in connection.ts

Yes, if we want to emit those from within the frontend (which as described above is complicated and could only reliably happen if and only if the "Shut down" button was pressed in JupyterLab, as any other server disconnect might indicate a temporary network failure, etc...). The next step would be to update IClientNotifyParams and IClientRequestParams, but again I am not convinced whether we want to have this handled by the frontend, or rather by the python extension instead.

We also need to investigate how this would interact with multi-user deployments such as JupyterHub and Real Time Collaboration. In short I do not think that we should implement this in this PR because there are too many unknowns; you are right that we should look into it and thank you for starting this discussion. I think we should create a new issue out of this to allow others to chime in with ideas before deciding on implementation.

Edit: there is a stop_extension hook in works in jupyter_server and we should probably use that over atexit: https://github.com/jupyter-server/jupyter_server/pull/526

krassowski avatar Jul 03 '21 21:07 krassowski

All right, thanks for clarifying. I will focus on implementing the TCP support for self launched processes for now then and open a separate issue to discuss on how to proceed from this stage, when this PR is done.

FlyingSamson avatar Jul 04 '21 14:07 FlyingSamson

Picked up conflicts with schema. Sadly GitHub Actions do not kick in until conflicts are resolved. I also added a note about stop_extension hook in my previous comment.

krassowski avatar Jul 08 '21 17:07 krassowski

Picked up conflicts with schema. Sadly GitHub Actions do not kick in until conflicts are resolved. I also added a note about stop_extension hook in my previous comment.

All right, so I can just pull master and merge it into the tcp_mode branch, thereby resolving the conflict, and then push again, correct?

FlyingSamson avatar Jul 08 '21 17:07 FlyingSamson

pull master and merge it into the tcp_mode branch,

or do it through the web ui if you're feeling brave...

bollwyvl avatar Jul 08 '21 17:07 bollwyvl

Binder gives you a virtual network, and it acts a little funny. In CI/binder/hub, they run even funnier, and as a user of such a service, one has even less control of them. We've had to do a fair amount of upstream work to ensure that test suites (e.g. pygls) can be run as-is... Usually the localhost/127 is all that's needed, but sometimes hard coded ports also need some techniques e.g. environment variables, random port discovery.

So yeah, outside of tests, anything that takes an address should at least accept overriding the host/port, and pick robust, loudly announced default behavior if unspecified.

bollwyvl avatar Jul 24 '21 14:07 bollwyvl

Thank you for good questions! I will aim to give some answers during the weekend.

I wanted to explain why I cancelled the CI runs: there was a new JupyterLab (3.1) release which required some changes to the Robot tests on CI and those were going to fail slowly taking CI time, so I needed to stop them. It is all fixed now so once you merge master or rebase the CI here should work fine.

krassowski avatar Aug 02 '21 22:08 krassowski

Sorry for the long silence, I had a deadline at work which was demanding my attention. But now I have more time and will try to get this ready :)

FlyingSamson avatar Oct 23 '21 11:10 FlyingSamson

No worries and thank you for persevering! You may need to rebase to pick up a new CI dependencies pin which prevents CI from failing.

krassowski avatar Oct 24 '21 20:10 krassowski

I'm still struggling to get the stop procedure to work on windows. According to the anyio docs Process.kill is an alias of Process.terminate on Windows, which in turn calls the Win32 API function TerminateProcess(). However, the latter, opposed to the behavior of Process.kill on Posix systems, seems to be asynchronous, i.e., it will return immediately and does not ensure that the process is dead before returning. Does somebody know how I could achieve the same behavior on Windows?

If that indeed is not possible, how should I handle this on Windows?
Should I just raise a warning in session.stop_process if the OS is Windows instead of the call to Process.kill on Posix systems?
How am I supposed to test this? The current test works on Posix, so I presume that is fine. Should I just give Windows some extra time (not sure how much more time Windows will require, as I don't have a Windows machine to test this locally, maybe 10 seconds for a start) and check afterwards if the process has gone away? This would than at least test that the process will die at some point, if not at the time that we force it to on other systems.
If indeed it takes that long for Windows to shut down the process, should the timeout parameter of session.stop_process be OS dependent, to keep it short on Posix systems.

Sorry for the myriad of questions. I'm thankful for any advice :)

FlyingSamson avatar Oct 31 '21 12:10 FlyingSamson

I'm still struggling to get the stop procedure to work on windows.

If no one has the ability to test/support Windows properly we can merge the best solutions you find appropriate and release this saying that Windows support is provisional and "help wanted".

Should I just give Windows some extra time (not sure how much more time Windows will require, as I don't have a Windows machine to test this locally, maybe 10 seconds for a start) and check afterwards if the process has gone away?

Sounds good to me.

If indeed it takes that long for Windows to shut down the process, should the timeout parameter of session.stop_process be OS dependent, to keep it short on Posix systems.

yes

krassowski avatar Oct 31 '21 13:10 krassowski

I think the code should be ready for another review round (I don't think that the smoke installation error was caused by the made changes).

To fully take advantage of the structured concurrency offered by anyio, I think more work needs to be done from the jupyterlab extension API side, as the current implementation using Tornado Websockets, while supporting coroutines for open() and on_message(), does not seem to be made for structured concurrency. E.g., the doc of open() states that no messages will be processed until open() has returned, thus preventing any structured concurrency happening within without using another event loop in a separate thread.

FlyingSamson avatar Jul 06 '22 16:07 FlyingSamson

Looking at the issue of freezes everywhere except for Ubuntu py3.10 on; there is no sane way to debug it on CI/with pytest, but locally I can reproduce it. Starting lab I see:

Exception while listening Cannot add child handler, the child watcher does not have a loop attached
    Traceback (most recent call last):
      File "jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/session.py", line 162, in run
        await self.initialize()
      File "jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/session.py", line 180, in initialize
        await self.init_process()
      File "jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/session.py", line 328, in init_process
        await self.start_process(self.spec["argv"])
      File "jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/session.py", line 230, in start_process
        env=self.substitute_env(self.spec.get("env", {}), os.environ),
      File ".pyenv/versions/3.7.15/envs/lsp-lab3.x-py3.7/lib/python3.7/site-packages/anyio/_core/_subprocesses.py", line 135, in open_process
        start_new_session=start_new_session,
      File ".pyenv/versions/3.7.15/envs/lsp-lab3.x-py3.7/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 1112, in open_process
        start_new_session=start_new_session,
      File ".pyenv/versions/3.7.15/lib/python3.7/asyncio/subprocess.py", line 217, in create_subprocess_exec
        stderr=stderr, **kwds)
      File ".pyenv/versions/3.7.15/lib/python3.7/asyncio/base_events.py", line 1544, in subprocess_exec
        bufsize, **kwargs)
      File ".pyenv/versions/3.7.15/lib/python3.7/asyncio/unix_events.py", line 193, in _make_subprocess_transport
        self._child_watcher_callback, transp)
      File ".pyenv/versions/3.7.15/lib/python3.7/asyncio/unix_events.py", line 941, in add_child_handler
        "Cannot add child handler, "
    RuntimeError: Cannot add child handler, the child watcher does not have a loop attached

Coming from self.process = await anyio.open_process() call.

krassowski avatar Jan 01 '23 13:01 krassowski

This appears to be https://bugs.python.org/issue35621 which was only fixed in Python 3.8 as the workaround from https://bugs.python.org/msg370355:

import asyncio
from .threaded_child_watcher import ThreadedChildWatcher   # where ThreadedChildWatcher was copied from CPython
asyncio.set_child_watcher(ThreadedChildWatcher())

does work. This is for Unix only, no idea how to fix this on Windows, and the workaround may have side-effects on tornado.

Switching to trio backend gives us:

RuntimeError: There is no current event loop in thread 'Thread-2'.

krassowski avatar Jan 01 '23 21:01 krassowski

I guess the question becomes: do we really need/want start_process to by async?

krassowski avatar Jan 01 '23 23:01 krassowski

So things are getting interesting with https://github.com/krassowski/jupyterlab-lsp/pull/4/commits/12bfbd42b6085f290affd1f6375049d5bc27a229:

  • Windows py3.10 works on CI (which gives me hope) but throws ValueError: I/O operation on closed pipe from time to time in tests which is a known problem and apparently can be worked around by calling private process._transport.close(). Probably worth finding a bpo or filing an issue with cPython.
  • Ubuntu py3.7 starts up locally but hangs when running tests (both locally and on CI). When manually aborting the tests we can see that it raises:
Traceback (most recent call last):
  File "/3.7.15/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/3.7.15/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/lsp-lab3.x-py3.7/lib/python3.7/site-packages/anyio/_core/_eventloop.py", line 70, in run
    return asynclib.run(func, *args, **backend_options)
  File "/lsp-lab3.x-py3.7/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 292, in run
    return native_run(wrapper(), debug=debug)
  File "/3.7.15/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/3.7.15/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "/lsp-lab3.x-py3.7/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
    return await func(*args)
  File "jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/session.py", line 179, in run
    await self.cleanup()
  File "jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/session.py", line 205, in cleanup
    await self.stop_process(self.stop_timeout)
  File "jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/session.py", line 254, in stop_process
    self.process.terminate()
  File "/lsp-lab3.x-py3.7/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 1053, in terminate
    self._process.terminate()
  File "/3.7.15/lib/python3.7/asyncio/subprocess.py", line 131, in terminate
    self._transport.terminate()
  File "/3.7.15/lib/python3.7/asyncio/base_subprocess.py", line 150, in terminate
    self._check_proc()
  File "/3.7.15/lib/python3.7/asyncio/base_subprocess.py", line 143, in _check_proc
    raise ProcessLookupError()
ProcessLookupError

krassowski avatar Jan 02 '23 17:01 krassowski