jupyter-server-proxy
jupyter-server-proxy copied to clipboard
Run server processes on a Unix socket
This is fairly rough, but if the config for a server process includes 'unix': True
and no port number, j-s-p will create a new temp folder and give the server process a path inside that instead of a TCP port number. It then expects the process to create and bind a Unix socket at the given path, and will connect to that to forward requests.
On a shared host, this means that only the user whose server this is can connect to the socket, whereas anyone with access to the system can connect to a localhost TCP socket. It also avoids the race condition where the parent selects an unused TCP port, but something else binds that port before the child process can.
For now, I've left websockets out of this, because Tornado's websocket client doesn't seem to easily support passing in a resolver object like its HTTPClient does. I think this can be useful even without websocket support, and I'd hope that we could either add to Tornado or find an alternative like aiohttp to fill that in later.
I have a corresponding branch of my hello_jupyter_proxy
repo to test this with: https://github.com/takluyver/hello_jupyter_proxy/tree/unix-sock
Fixes #321
Thanks for submitting your first pull request! You are awesome! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
hello there, great idea !
I tried using this with code-server: https://github.com/coder/code-server which already have a --socket
option, but it gives me a 500 error on launch :
[W 2022-06-03 09:42:34.820 ServerApp] 500 GET /vscode/ (127.0.0.1): could not start vscode in time
I can verify with socat that code-server is running on the socket but the request is not going through the proxy or there is some timing issue ? this is my j-s-p config:
c.ServerProxy.servers = {
'vscode': {
'command': ['code-server', '--auth', 'none', '--socket', '{port}'],
'unix': True,
'absolute_url': False,
'launcher_entry': {
'title': 'VS Code'
},
'new_browser_tab': True
}
}
I guess it wont be useful without websockets working, but I still expected to get a initial page to show up, maybe I am doing something wrong ?
This is a fantastic new feature!
For supporting code-server from within JupyterLab this would be great. For security reasons we cannot add code-server to the JupyterLab launchpad through jupyter-server-proxy, yet, on our multi-user login-nodes. It does not support tokens/password through urlparams. But as soon as jupyter-server-proxy can proxy it through a unix socket this would be possible. Cool!
I created a jupyter-codeserver-proxy
at Github and PyPi which would highly profit from support for unix sockets.
The branch unixsocket
of jupyter-codeserver-proxy
HERE shows that this already works partly for code-server
. Cool. But it fails with this dialog:
@jhgoebbert could that error be related to connecting a websocket? Sadly, this branch doesn't support websockets yet. I think it would need either a change in Tornado (https://github.com/tornadoweb/tornado/pull/3172 ), or switching the HTTP client machinery for j-s-p to some other library, such as aiohttp.
@rvalieris I'm not sure what went wrong there, though @jhgoebbert seems to have got past that. In general, that 'did not start ... in time' message means that it launched the process and then tried to get an HTTP response from it, for up to a second each try and up to 5 seconds total, but that didn't work.
Just to mention, I am still interested in pushing this forwards, if anyone more involved with JSP thinks it's worthwhile.
The biggest open question I see is what to do about websocket support - as it stands, websockets don't work when proxying to a Unix socket. There are 2½ options:
- Push on my PR to get this supported in Tornado (which hasn't seen any interest yet) and wait for a new release.
- Switch to aiohttp for the client part of the proxy - this may mean somewhat more 'translation' code for requests and responses, because the server side of the proxy would still be using Tornado. But aiohttp is already used to check when the server process is ready, so it's not a new dependency.
- Live with no websockets over Unix sockets for now, and improve this later.
I think this would be an awesome feature @takluyver ! Supporting websockets is very important for the apps that I'm most interested in, but maybe others would be okay with living without ws for now.
@yuvipanda mentioned wanting to switch to aiohttp. Reading the tea leaves it seems like jupyter server may eventually move off of tornado anyways.
@rvalieris I'm not sure what went wrong there, though @jhgoebbert seems to have got past that.
Keep in mind that I applied the changes mentioned above ( https://github.com/jupyterhub/jupyter-server-proxy/pull/337/files/885243ac9f1f21ca4869876ecfba0286f486328f ) to run the example.
Ah, sorry, I missed your replies at the time. My tornado PR has now been merged (though it has yet to be released), so for now I've updated this PR to proxy websockets with tornado. This part will need the next version of Tornado to come out, presumably version 6.3.
@jhgoebbert would you have time to test again, with this branch and tornado installed from master?
I'd prefer any conversion to aiohttp to be a separate PR if possible - I think it's easier to review those changes independently than combined. But I'm happy to tackle it in this PR if you ask me to. I imagine it might depend on how soon we can expect a tornado release, and I've asked the maintainer about that.
@takluyver I think a conversion to aiohttp should be in a separate PR as well. Tornado's two most recent minor releases were 7 and 8 months after the previous, so if that trend holds then it will be 5-6 months until the next one.
Thanks! Then :crossed_fingers: this should be ready to try out. The combination of a websocket connection and a Unix socket backend will need Tornado 6.3, but websockets+TCP and regular HTTP requests + Unix sockets should work with current versions.
unix is too vague for configuration I think. I suggest the configuration is called unix_socket
That makes sense to me, thanks.
Documentation entry
Good point, will do.
This option should not be used alongside port right, because it would be nonsensical? It would be good to take some action related to this. I'm thinking one could enforce it by erroring, or providing a warning, or documenting it. I think warning + documenting it can make most sense as I think erroring can break things a bit too much.
I've been thinking that it's easy for packages to specify unix_socket=True
, and still support older versions of JSP, by binding a TCP socket if they get a numeric port - e.g. this code in my hello_jupyter_proxy
project.
I assume in most cases, such projects would already let JSP select a TCP port at random, rather than specifying a fixed port. But I might not complain if both are specified, even though they can't both be used at the same time.
Thanks @takluyver for working on this!!
Review feedback
- [x] There is a merge conflict to resolve
- [x] There is a mix of
unix_sock
andunix_socket
, I figure we should stick with one (my preference isunix_socket
) - [x]
SuperviseAndProxyHandler
and theunix_sock
argument confusion:-
I saw that
self.unix_sock = False
, and its only used as a boolean value in an if statement. This couples to the configuration optionunix_socket
, which has a description sayingIf *True*, the server will listen on a Unix socket at a filesystem path, instead of a TCP port.
. To me, this implies thatunix_socket
is to be a boolean, but in the description it also saisThe server should create a Unix socket bound to this path and listen for HTTP requests on it.
- so, it should be a unix socket path really, which "if set" could imply something. -
I'm also seeing
is_unix_sock
which reads theport
and doesn't care for theunix_sock
value, which make me even more confused as it seems likeport
is to hold the path after all. -
[ ] I suggest that we seek agreement on what configuration API to implement in this PR, following which an implementation is made and reviewed. We could for example either:
- support
port
to be a unix socket path - add
unix_socket_path
as a server process configuration option and emit a warning ifport
is configured (and therefore ignored) for a server process during the time service process configuration is read.
Currently I see the second option as easier to understand intuitively, maintain, document, learn about as an end user, communicate as a new added feature in the changelog, and for users to adopt.
- support
-
Thanks @takluyver and @consideRatio ! I vote in favor of using a separate parameter for unix_socket_path
and not overloading port
.
This is a fantastic new feature! For supporting code-server from within JupyterLab this would be great. For security reasons we cannot add code-server to the JupyterLab launchpad through jupyter-server-proxy, yet, on our multi-user login-nodes. It does not support tokens/password through urlparams. But as soon as jupyter-server-proxy can proxy it through a unix socket this would be possible. Cool!
@jhgoebbert Instead of saving the password to a temp file as you are doing here, you could as well write it to $HOME/.config/code-server/config.yaml
file and change the code server command to include config file in CLI arguments. When user launches code server from launchpad, login page will be shown to user and user can go find the password in $HOME/.config/code-server/config.yaml
and use it to authenticate. This is more secure than passing password in urlparams as password always stays in user's home directory. Of course passing password through urlparams will avoid one more step for end user.
I agree that with unix sockets, it will be even more secure, but I think it is still doable with jupyter-server-proxy
as it is now. Am I missing something here?
@takluyver if you find time to work this, I'll make sure to find time to review this quickly going onwards!
Do you prefer me to resolve conflicts by rebasing (clean history) or by merging master in (accurate history)?
Hi @takluyver!! Either would be acceptable but my preference at this point is a rebase!
OK, I've done a rebase.
I suggest that we seek agreement on what configuration API to implement in this PR, following which an implementation is made and reviewed. We could for example either:
- support port to be a unix socket path
- add unix_socket_path as a server process configuration option and emit a warning if port is configured (and therefore ignored) for a server process during the time service process configuration is read.
I'd roughly implemented for the first of those - it involves minimal new settings to describe, and to my mind there's a kind of equivalence between a port number and a Unix socket path - both specify 'where' to connect to within the context of a particular machine, so it sort of fits to use the same parameter for them. But I think you're right that it's probably easier to document and understand the second option.
One variant on that: we could have the unix_socket
parameter take either a path or a special sentinel value, such as True, to use a socket in a new temp directory. That would mirror setting port=0
to get a random TCP port. Any preference between that and having two separate parameters like unix_socket
and unix_socket_path
?
Oh yes, one concrete reason for conflating the socket path with the port: the command template can use {port}
to get either the port number or the Unix socket path. This makes it relatively straightforward to support existing versions of J-s-p which don't support Unix sockets, and platforms where Unix sockets aren't available. The proxied application just needs to check if it has been passed a numeric TCP port or not, like this example.
Options:
- Allow both in the command template (e.g.
myapp -p {port} -u {unix_socket}
), with default values like 0 or''
for whichever is not in use. Proxied applications using this won't be compatible with older versions of j-s-p. - Leave the dual meaning of
{port}
(TCP port or Unix socket path) for the command template. - Specify two different command templates -
command
for when a TCP port is used andcommand_unix
for when a Unix socket is used. Compatible with older j-s-p (which will only usecommand
), but ugly. - Something else?
Oh yes, one concrete reason for conflating the socket path with the port: the command template can use {port} to get either the port number or the Unix socket path.
I don't understand this benefit yet, but I'm currently quite strongly opined towards a a self-explanatory name like unix_socket_path
without re-using port
as it makes me understand what goes on so well, and makes future logic decisions follow quite easily from it.
This makes it relatively straightforward to support existing versions of J-s-p which don't support Unix sockets, and platforms where Unix sockets aren't available.
Hmmmm, I don't understand this example yet.
I'm thinking that whoever has configured JSP server processes with a new JSP config using unix_socket_path can enforce a modern version of JSP. This includes those that has developed a python package that provides this information directly via - they would just start depending on JSP where they provide a lower bound.
OK, so you'd go for option 1 of my list? On Unix, the command would fill out as something like myapp -p 0 -u /tmp/blah123/socket
and on Windows, something like myapp -p 1234 -u ''
?
OK, so you'd go for option 1 of my list? On Unix, the command would fill out as something like
myapp -p 0 -u /tmp/blah123/socket
and on Windows, something likemyapp -p 1234 -u ''
?
I need help understanding the details here I think, I'm not following this well. Can you provide a paragraph or two describing the situation you consider?
Is it a reflection about providing JSP server process config without being able to know if you are doing it for a windows computer or linux/mac computer where unix sockets are available?
Yup, of course! I think from your last sentence you've got the gist, but I'll unpack a bit anyway.
The primary case that I've been thinking about in this PR is programs launched by JSP, with a Python package providing config from an entry point. I'm assuming we can't rely on a hardcoded port number or socket path - not least, because on a multi-user system, someone else might have already claimed that port/path. So we want to JSP to dynamically assign the port/path to use, and pass that to the child process.
In current JSP, you do this by specifying port=0
and a command template containing {port}
, which JSP will replace with a random port number. The equivalent for a Unix socket is to make a temp directory to contain a new socket (in fact, this is better, because we avoid a race condition). We could expose this as a new templating argument like {unix_socket}
. But then:
- Only one of
{port}
and{unix_socket}
will be used when we launch the process. Should the other one have some neutral value like 0? Or is it an error to configure a template containing{port}
when we're using a Unix socket, and vice versa? - Does configuring
unix_socket=True
mean 'this must use a Unix socket' or 'use a Unix socket if available'? I.e. is each application responsible for checking that it's not on Windows before requesting Unix sockets, or can it specifyunix_socket=True
and expect JSP to fallback to TCP sockets if it's running on Windows? - (Do we want to arrange it so the same fallback works with older JSP which only supports TCP sockets? I think you've already answered this (saying no), but I've included it here because it's part of my chain of thought).
I quite like the idea of having a single property to handle both tcp ports and unix sockets. This is similar to how you set the docker host (unix:///var/run/docker.sock
, tcp://localhost:12345
), and also maps well onto the underlying unix networking- they're effectively the same under the hood.
Would you support this if we renamed port
to something more generic and kept {port}
for backwards compatibility in the command template?
OK, with the new changes:
-
unix_socket=True
still allocates a temp directory for a new socket, akin toport=0
. -
unix_socket='/var/run/something.sock'
expects the socket at a known path. This can also be used without a command to have a named proxy to something not launched by JSP (as added for TCP ports in #339). - The command template uses
{unix_socket}
to get the path of the Unix socket. This expand to the empty string if we're using a TCP port, while{port}
will expand to 0 if we're using a Unix socket.
I've rearranged some of the code in handlers.py
to facilitate this. There's a new class NamedLocalProxyHandler
for when we configure a proxy with a name but no command (so JSP is not managing the process). SuperviseAndProxyHandler
is now back to handling only the case where we have a command to launch and supervise.
I quite like the idea of having a single property to handle both tcp ports and unix sockets. This is similar to how you set the docker host (unix:///var/run/docker.sock, tcp://localhost:12345),
ZMQ also uses a similar addressing scheme, although it calls Unix sockets ipc://
. But there's no standard that I know of for this, especially if you want to include a way to specify 'Unix socket in new random temp directory'. And it's a fair few extra characters when you just want to specify a TCP port. Up to the JSP maintainers, but I think we're heading in the direction of having separate options.
Would you support this if we renamed port to something more generic and kept {port} for backwards compatibility in the command template?
Maybe, but I don't understand things well enough to form a clear opinion. I trust your judgement though @manics!
I've invested quite a bit of effort into understanding this feature already, but still feel quite lost. @manics could you work with @takluyver towards resolving this PR without me? I have a growing backlog of things I'd like to contribute with in the jupyterhub org, and reviewing something I don't understand well takes quite a bit of time and effort for me.
General review point
- [x] Update the PR title and description to reflect the state of the PR
I've invested quite a bit of effort into understanding this feature already, but still feel quite lost.
Sorry to hear that. If it would help, I'm happy to try writing a brief summary of the goal - as I see it - and how I've tried to implement it. But I'll respect your decision to focus on other things if you prefer.
I'm happy to try writing a brief summary of the goal - as I see it - and how I've tried to implement it.
@takluyver that would be great to have, I think it would be suitable to put in an updated PR description! I'll probably end up personally benefiting from it as well when reviewing other work related to this in the feature.
I think a key piece of the complexity for me has been a lack of understanding of the assumptions of what is to be accomodated. I for example assumed that you would need to specify a unix socket path explicitly rather than allow for a temporary path be generated.
I think now, thinking about what I don't undertand its one key piece - that we need a configuration API to support the wish to use unix sockets without specifying it in the server process configuration.
Another key piece is the coupling with port
in any way. But I'm starting to understand that its relevant because when you specify the command, you specify it without knowing if its linux or windows in case you develop a Python package that ships with a server process config snippet. I think I assumed that you wouldn't be able to provide either port or unix socket path to a command string and have it work in most applications. But, if that is expected in most applications, then having a single variable could be relevant. If taking this path though, one may still get into trouble if not all applications support this, so then maybe one end up needing to provide two different command's - or maybe one could let command be a callable to render differently based on linux / windows etc.
try writing a brief summary of the goal - as I see it - and how I've tried to implement it.
A big :+1: to providing a summary of the goals to accomplish with the provided implementation, the more I think about it, the more I understand that a common understanding of such goals is crucial to review and think about the implementation.
Oh I see you have now updated the description, its great!
This works already for regular HTTP requests. Forwarding websockets over a Unix socket will work once Tornado 6.3 is released.
Is this part of the traitlet configuration's help string already? If not, its probably worth putting there as well.
and fill the new
{unix_socket}
template argument
A key piece of understanding that I didn't caught onto quickly was that the server process definition's command
was given these arguments, so maybe you could inject "command" in the sentence I quoted as well?
Thank you soo much for your thorough work on this @takluyver and helping me understand these details better!