jupyter-server-proxy Run server processes on a Unix socket

This is fairly rough, but if the config for a server process includes 'unix': True and no port number, j-s-p will create a new temp folder and give the server process a path inside that instead of a TCP port number. It then expects the process to create and bind a Unix socket at the given path, and will connect to that to forward requests.

On a shared host, this means that only the user whose server this is can connect to the socket, whereas anyone with access to the system can connect to a localhost TCP socket. It also avoids the race condition where the parent selects an unused TCP port, but something else binds that port before the child process can.

For now, I've left websockets out of this, because Tornado's websocket client doesn't seem to easily support passing in a resolver object like its HTTPClient does. I think this can be useful even without websocket support, and I'd hope that we could either add to Tornado or find an alternative like aiohttp to fill that in later.

I have a corresponding branch of my hello_jupyter_proxy repo to test this with: https://github.com/takluyver/hello_jupyter_proxy/tree/unix-sock

Fixes #321

Apr 25 '22 17:04 takluyver

Thanks for submitting your first pull request! You are awesome! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly. welcome You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

Apr 25 '22 17:04 welcome[bot]

hello there, great idea !

I tried using this with code-server: https://github.com/coder/code-server which already have a --socket option, but it gives me a 500 error on launch : [W 2022-06-03 09:42:34.820 ServerApp] 500 GET /vscode/ (127.0.0.1): could not start vscode in time

I can verify with socat that code-server is running on the socket but the request is not going through the proxy or there is some timing issue ? this is my j-s-p config:

c.ServerProxy.servers = {
  'vscode': {
    'command': ['code-server', '--auth', 'none', '--socket', '{port}'],
    'unix': True,
    'absolute_url': False,
    'launcher_entry': {
        'title': 'VS Code'
    },
    'new_browser_tab': True
  }
}

I guess it wont be useful without websockets working, but I still expected to get a initial page to show up, maybe I am doing something wrong ?

Jun 03 '22 13:06 rvalieris

This is a fantastic new feature!
For supporting code-server from within JupyterLab this would be great. For security reasons we cannot add code-server to the JupyterLab launchpad through jupyter-server-proxy, yet, on our multi-user login-nodes. It does not support tokens/password through urlparams. But as soon as jupyter-server-proxy can proxy it through a unix socket this would be possible. Cool!

Jun 12 '22 06:06 jhgoebbert

I created a jupyter-codeserver-proxy at Github and PyPi which would highly profit from support for unix sockets.

Jun 12 '22 14:06 jhgoebbert

The branch unixsocket of jupyter-codeserver-proxy HERE shows that this already works partly for code-server. Cool. But it fails with this dialog:

Jun 13 '22 20:06 jhgoebbert

@jhgoebbert could that error be related to connecting a websocket? Sadly, this branch doesn't support websockets yet. I think it would need either a change in Tornado (https://github.com/tornadoweb/tornado/pull/3172 ), or switching the HTTP client machinery for j-s-p to some other library, such as aiohttp.

@rvalieris I'm not sure what went wrong there, though @jhgoebbert seems to have got past that. In general, that 'did not start ... in time' message means that it launched the process and then tried to get an HTTP response from it, for up to a second each try and up to 5 seconds total, but that didn't work.

Aug 01 '22 11:08 takluyver

Just to mention, I am still interested in pushing this forwards, if anyone more involved with JSP thinks it's worthwhile.

The biggest open question I see is what to do about websocket support - as it stands, websockets don't work when proxying to a Unix socket. There are 2½ options:

Push on my PR to get this supported in Tornado (which hasn't seen any interest yet) and wait for a new release.
Switch to aiohttp for the client part of the proxy - this may mean somewhat more 'translation' code for requests and responses, because the server side of the proxy would still be using Tornado. But aiohttp is already used to check when the server process is ready, so it's not a new dependency.
Live with no websockets over Unix sockets for now, and improve this later.

Aug 23 '22 16:08 takluyver

I think this would be an awesome feature @takluyver ! Supporting websockets is very important for the apps that I'm most interested in, but maybe others would be okay with living without ws for now.

@yuvipanda mentioned wanting to switch to aiohttp. Reading the tea leaves it seems like jupyter server may eventually move off of tornado anyways.

Aug 23 '22 16:08 ryanlovett

@rvalieris I'm not sure what went wrong there, though @jhgoebbert seems to have got past that.

Keep in mind that I applied the changes mentioned above ( https://github.com/jupyterhub/jupyter-server-proxy/pull/337/files/885243ac9f1f21ca4869876ecfba0286f486328f ) to run the example.

Aug 26 '22 05:08 jhgoebbert

Ah, sorry, I missed your replies at the time. My tornado PR has now been merged (though it has yet to be released), so for now I've updated this PR to proxy websockets with tornado. This part will need the next version of Tornado to come out, presumably version 6.3.

@jhgoebbert would you have time to test again, with this branch and tornado installed from master?

I'd prefer any conversion to aiohttp to be a separate PR if possible - I think it's easier to review those changes independently than combined. But I'm happy to tackle it in this PR if you ask me to. I imagine it might depend on how soon we can expect a tornado release, and I've asked the maintainer about that.

Sep 20 '22 10:09 takluyver

@takluyver I think a conversion to aiohttp should be in a separate PR as well. Tornado's two most recent minor releases were 7 and 8 months after the previous, so if that trend holds then it will be 5-6 months until the next one.

Sep 20 '22 17:09 ryanlovett

Thanks! Then :crossed_fingers: this should be ready to try out. The combination of a websocket connection and a Unix socket backend will need Tornado 6.3, but websockets+TCP and regular HTTP requests + Unix sockets should work with current versions.

Sep 21 '22 12:09 takluyver

unix is too vague for configuration I think. I suggest the configuration is called unix_socket

That makes sense to me, thanks.

Documentation entry

Good point, will do.

This option should not be used alongside port right, because it would be nonsensical? It would be good to take some action related to this. I'm thinking one could enforce it by erroring, or providing a warning, or documenting it. I think warning + documenting it can make most sense as I think erroring can break things a bit too much.

I've been thinking that it's easy for packages to specify unix_socket=True, and still support older versions of JSP, by binding a TCP socket if they get a numeric port - e.g. this code in my hello_jupyter_proxy project.

I assume in most cases, such projects would already let JSP select a TCP port at random, rather than specifying a fixed port. But I might not complain if both are specified, even though they can't both be used at the same time.

Dec 01 '22 17:12 takluyver

Thanks @takluyver for working on this!!

Review feedback

[x] There is a merge conflict to resolve
[x] There is a mix of unix_sock and unix_socket, I figure we should stick with one (my preference is unix_socket)
[x] SuperviseAndProxyHandler and the unix_sock argument confusion:
- I saw that self.unix_sock = False, and its only used as a boolean value in an if statement. This couples to the configuration option unix_socket, which has a description saying If *True*, the server will listen on a Unix socket at a filesystem path, instead of a TCP port.. To me, this implies that unix_socket is to be a boolean, but in the description it also sais The server should create a Unix socket bound to this path and listen for HTTP requests on it. - so, it should be a unix socket path really, which "if set" could imply something.
- I'm also seeing is_unix_sock which reads the port and doesn't care for the unix_sock value, which make me even more confused as it seems like port is to hold the path after all.
- [ ] I suggest that we seek agreement on what configuration API to implement in this PR, following which an implementation is made and reviewed. We could for example either:
  - support port to be a unix socket path
  - add unix_socket_path as a server process configuration option and emit a warning if port is configured (and therefore ignored) for a server process during the time service process configuration is read.
  Currently I see the second option as easier to understand intuitively, maintain, document, learn about as an end user, communicate as a new added feature in the changelog, and for users to adopt.

Dec 05 '22 22:12 consideRatio

Thanks @takluyver and @consideRatio ! I vote in favor of using a separate parameter for unix_socket_path and not overloading port.

Dec 06 '22 18:12 ryanlovett

This is a fantastic new feature! For supporting code-server from within JupyterLab this would be great. For security reasons we cannot add code-server to the JupyterLab launchpad through jupyter-server-proxy, yet, on our multi-user login-nodes. It does not support tokens/password through urlparams. But as soon as jupyter-server-proxy can proxy it through a unix socket this would be possible. Cool!

@jhgoebbert Instead of saving the password to a temp file as you are doing here, you could as well write it to $HOME/.config/code-server/config.yaml file and change the code server command to include config file in CLI arguments. When user launches code server from launchpad, login page will be shown to user and user can go find the password in $HOME/.config/code-server/config.yaml and use it to authenticate. This is more secure than passing password in urlparams as password always stays in user's home directory. Of course passing password through urlparams will avoid one more step for end user.

I agree that with unix sockets, it will be even more secure, but I think it is still doable with jupyter-server-proxy as it is now. Am I missing something here?

Dec 13 '22 15:12 mahendrapaipuri

@takluyver if you find time to work this, I'll make sure to find time to review this quickly going onwards!

Dec 28 '22 12:12 consideRatio

Do you prefer me to resolve conflicts by rebasing (clean history) or by merging master in (accurate history)?

Jan 17 '23 14:01 takluyver

Hi @takluyver!! Either would be acceptable but my preference at this point is a rebase!

Jan 17 '23 14:01 consideRatio

OK, I've done a rebase.

I suggest that we seek agreement on what configuration API to implement in this PR, following which an implementation is made and reviewed. We could for example either:

support port to be a unix socket path

add unix_socket_path as a server process configuration option and emit a warning if port is configured (and therefore ignored) for a server process during the time service process configuration is read.

I'd roughly implemented for the first of those - it involves minimal new settings to describe, and to my mind there's a kind of equivalence between a port number and a Unix socket path - both specify 'where' to connect to within the context of a particular machine, so it sort of fits to use the same parameter for them. But I think you're right that it's probably easier to document and understand the second option.

One variant on that: we could have the unix_socket parameter take either a path or a special sentinel value, such as True, to use a socket in a new temp directory. That would mirror setting port=0 to get a random TCP port. Any preference between that and having two separate parameters like unix_socket and unix_socket_path?

Jan 17 '23 16:01 takluyver

Oh yes, one concrete reason for conflating the socket path with the port: the command template can use {port} to get either the port number or the Unix socket path. This makes it relatively straightforward to support existing versions of J-s-p which don't support Unix sockets, and platforms where Unix sockets aren't available. The proxied application just needs to check if it has been passed a numeric TCP port or not, like this example.

Options:

Allow both in the command template (e.g. myapp -p {port} -u {unix_socket}), with default values like 0 or '' for whichever is not in use. Proxied applications using this won't be compatible with older versions of j-s-p.
Leave the dual meaning of {port} (TCP port or Unix socket path) for the command template.
Specify two different command templates - command for when a TCP port is used and command_unix for when a Unix socket is used. Compatible with older j-s-p (which will only use command), but ugly.
Something else?

Jan 17 '23 16:01 takluyver

Oh yes, one concrete reason for conflating the socket path with the port: the command template can use {port} to get either the port number or the Unix socket path.

I don't understand this benefit yet, but I'm currently quite strongly opined towards a a self-explanatory name like unix_socket_path without re-using port as it makes me understand what goes on so well, and makes future logic decisions follow quite easily from it.

This makes it relatively straightforward to support existing versions of J-s-p which don't support Unix sockets, and platforms where Unix sockets aren't available.

Hmmmm, I don't understand this example yet.

I'm thinking that whoever has configured JSP server processes with a new JSP config using unix_socket_path can enforce a modern version of JSP. This includes those that has developed a python package that provides this information directly via - they would just start depending on JSP where they provide a lower bound.

Jan 17 '23 16:01 consideRatio

OK, so you'd go for option 1 of my list? On Unix, the command would fill out as something like myapp -p 0 -u /tmp/blah123/socket and on Windows, something like myapp -p 1234 -u ''?

Jan 17 '23 16:01 takluyver

OK, so you'd go for option 1 of my list? On Unix, the command would fill out as something like myapp -p 0 -u /tmp/blah123/socket and on Windows, something like myapp -p 1234 -u ''?

I need help understanding the details here I think, I'm not following this well. Can you provide a paragraph or two describing the situation you consider?

Is it a reflection about providing JSP server process config without being able to know if you are doing it for a windows computer or linux/mac computer where unix sockets are available?

Jan 17 '23 16:01 consideRatio

Yup, of course! I think from your last sentence you've got the gist, but I'll unpack a bit anyway.

The primary case that I've been thinking about in this PR is programs launched by JSP, with a Python package providing config from an entry point. I'm assuming we can't rely on a hardcoded port number or socket path - not least, because on a multi-user system, someone else might have already claimed that port/path. So we want to JSP to dynamically assign the port/path to use, and pass that to the child process.

In current JSP, you do this by specifying port=0 and a command template containing {port}, which JSP will replace with a random port number. The equivalent for a Unix socket is to make a temp directory to contain a new socket (in fact, this is better, because we avoid a race condition). We could expose this as a new templating argument like {unix_socket}. But then:

Only one of {port} and {unix_socket} will be used when we launch the process. Should the other one have some neutral value like 0? Or is it an error to configure a template containing {port} when we're using a Unix socket, and vice versa?
Does configuring unix_socket=True mean 'this must use a Unix socket' or 'use a Unix socket if available'? I.e. is each application responsible for checking that it's not on Windows before requesting Unix sockets, or can it specify unix_socket=True and expect JSP to fallback to TCP sockets if it's running on Windows?
(Do we want to arrange it so the same fallback works with older JSP which only supports TCP sockets? I think you've already answered this (saying no), but I've included it here because it's part of my chain of thought).

Jan 17 '23 17:01 takluyver

I quite like the idea of having a single property to handle both tcp ports and unix sockets. This is similar to how you set the docker host (unix:///var/run/docker.sock, tcp://localhost:12345), and also maps well onto the underlying unix networking- they're effectively the same under the hood.

Would you support this if we renamed port to something more generic and kept {port} for backwards compatibility in the command template?

Jan 17 '23 17:01 manics

OK, with the new changes:

unix_socket=True still allocates a temp directory for a new socket, akin to port=0.
unix_socket='/var/run/something.sock' expects the socket at a known path. This can also be used without a command to have a named proxy to something not launched by JSP (as added for TCP ports in #339).
The command template uses {unix_socket} to get the path of the Unix socket. This expand to the empty string if we're using a TCP port, while {port} will expand to 0 if we're using a Unix socket.

I've rearranged some of the code in handlers.py to facilitate this. There's a new class NamedLocalProxyHandler for when we configure a proxy with a name but no command (so JSP is not managing the process). SuperviseAndProxyHandler is now back to handling only the case where we have a command to launch and supervise.

I quite like the idea of having a single property to handle both tcp ports and unix sockets. This is similar to how you set the docker host (unix:///var/run/docker.sock, tcp://localhost:12345),

ZMQ also uses a similar addressing scheme, although it calls Unix sockets ipc://. But there's no standard that I know of for this, especially if you want to include a way to specify 'Unix socket in new random temp directory'. And it's a fair few extra characters when you just want to specify a TCP port. Up to the JSP maintainers, but I think we're heading in the direction of having separate options.

Jan 18 '23 11:01 takluyver

Would you support this if we renamed port to something more generic and kept {port} for backwards compatibility in the command template?

Maybe, but I don't understand things well enough to form a clear opinion. I trust your judgement though @manics!

I've invested quite a bit of effort into understanding this feature already, but still feel quite lost. @manics could you work with @takluyver towards resolving this PR without me? I have a growing backlog of things I'd like to contribute with in the jupyterhub org, and reviewing something I don't understand well takes quite a bit of time and effort for me.

General review point

[x] Update the PR title and description to reflect the state of the PR

Jan 18 '23 13:01 consideRatio

I've invested quite a bit of effort into understanding this feature already, but still feel quite lost.

Sorry to hear that. If it would help, I'm happy to try writing a brief summary of the goal - as I see it - and how I've tried to implement it. But I'll respect your decision to focus on other things if you prefer.

Jan 18 '23 13:01 takluyver

I'm happy to try writing a brief summary of the goal - as I see it - and how I've tried to implement it.

@takluyver that would be great to have, I think it would be suitable to put in an updated PR description! I'll probably end up personally benefiting from it as well when reviewing other work related to this in the feature.

I think a key piece of the complexity for me has been a lack of understanding of the assumptions of what is to be accomodated. I for example assumed that you would need to specify a unix socket path explicitly rather than allow for a temporary path be generated.

I think now, thinking about what I don't undertand its one key piece - that we need a configuration API to support the wish to use unix sockets without specifying it in the server process configuration.

Another key piece is the coupling with port in any way. But I'm starting to understand that its relevant because when you specify the command, you specify it without knowing if its linux or windows in case you develop a Python package that ships with a server process config snippet. I think I assumed that you wouldn't be able to provide either port or unix socket path to a command string and have it work in most applications. But, if that is expected in most applications, then having a single variable could be relevant. If taking this path though, one may still get into trouble if not all applications support this, so then maybe one end up needing to provide two different command's - or maybe one could let command be a callable to render differently based on linux / windows etc.

try writing a brief summary of the goal - as I see it - and how I've tried to implement it.

A big :+1: to providing a summary of the goals to accomplish with the provided implementation, the more I think about it, the more I understand that a common understanding of such goals is crucial to review and think about the implementation.

Oh I see you have now updated the description, its great!

This works already for regular HTTP requests. Forwarding websockets over a Unix socket will work once Tornado 6.3 is released.

Is this part of the traitlet configuration's help string already? If not, its probably worth putting there as well.

and fill the new {unix_socket} template argument

A key piece of understanding that I didn't caught onto quickly was that the server process definition's command was given these arguments, so maybe you could inject "command" in the sentence I quoted as well?

Thank you soo much for your thorough work on this @takluyver and helping me understand these details better!

Jan 18 '23 14:01 consideRatio

jupyter-server-proxy jupyter-server-proxy copied to clipboard

Run server processes on a Unix socket

Review feedback

General review point

jupyter-server-proxy
jupyter-server-proxy copied to clipboard