kubespawner
kubespawner copied to clipboard
Shouldn't KubeSpawner's cmd/args map to the K8s container's command/args?
I saw that c.KubeSpawner.default_url was reported to not have an impact and decided to investigate.
I end up concluding that configuring the command that the container starts with is very complicated. The Spawner, K8s, and the Dockerfile all have various names for similar things that all combine into a k8s containers args field where the command field apparently isn't configurable to my surprise.
cmd,args,notebook_dir,default_url, andget_args(...)are defined in in the Spawner base classget_args(...)combinesnotebook_dir,default_url, andargsinto a single appendix tocmd. There are changes to this in JupyterHub version 2.0.0 though, see theget_argsdefinition.- KubeSpawner sets the containers k8s field
argsto the variablereal_cmddecided below, which changed in 5ee6dc6f. Note that in k8s manifestscommandandargsare the equivalent of Dockerfilesentrypointandcmd.async def get_pod_manifest(self): # ... if self.cmd: real_cmd = self.cmd + self.get_args() else: # change commit comment: # fix use of get_args() (calling get_args() alone would omit the command) - real_cmd = self.get_args() + real_cmd = None - Z2JH's default config of KubeSpawner's
cmdis[]for z2jh >= 2.0.0 butjupyterhub-singleuserfor z2jh < 2.0.0. This means that by default, z2jh 2.0.0+ will expose this bug by default and silently ignoreargs,notebook_dir, ordefault_urlunlesscmdis set explicitly as well. (Related PR: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/2449)
Investigation conclusion
- I'm confused we opt to set
argsof the k8s container based on the traitletscmd,args,notebook_dir,default_urlinstead of settingcommandtocmd, and settingargstodefault_url,notebook_dirandargs. - I think the logic to silently ignore the traitlets
args,notebook_dir,default_urlwhencmdis unset is problematic and should lead to a warning at least.
I am trying to launch a rstudio image with this, and I am very much struggling that I can not "completely control" the command and args passed to starting the container
Do you think we can resolve this issue before Z2JH 1.0.0 is released? Either by adding docs and warnings, or perhaps by adding a new entrypoint parameter, and making it clear that entrypoint and cmd have their Docker semantics.
Although this is a K8s project I think it's fine to use the Docker terminology, especially as it's also the Open Container Initiative terminology. What's more important is that we make it clear and use consistent names throughout.
I agree that we must be consistent, and in this case we must be consistent against the Spawner base class that is k8s unaware, so then adding entrypoint would make sense alongside cmd.
We would then document that cmd and entrypoint refers to the Dockerfile names, and that we will set the k8s Pod's container's args and command section when we set cmd and entrypoint in KubeSpawner.
To conclude: :+1: for adding entrypoint as a KubeSpawner traitlet to configure the containers command.
@consideRatio is there going to be a separate fix for c.KubeSpawner.default_url ?
@mapsacosta if you set cmd the other args such as default_url should be passed. If it's not working can you show us your full configuration?
Update
In the initial post I wrote investigative conclusions, and I want to give an update about those two points.
- I'm confused we opt to set
argsof the k8s container based on the traitletscmd,args,notebook_dir,default_urlinstead of settingcommandtocmd, and settingargstodefault_url,notebook_dirandargs.- I think the logic to silently ignore the traitlets
args,notebook_dir,default_urlwhencmdis unset is problematic and should lead to a warning at least.
- Point 1 is addressed by:
- defining the new configuration
entrypointalongsidecmd, where we also document that these relate to the Dockerfile'sENTRYPOINTandCMDrather than k8scommandandargsequivalent Container specifications. - clarifying that the
cmdconfiguration if set also augments itself withspawner.get_args()which considers the configurationsip,port,default_url,notebook_dir,debug,disable_user_config,args.
- defining the new configuration
- No clear decision on how to address point 2 isn't taken and I'm struggling to suggest a sensible solution.
- Not setting
cmdcurrently influence a lot to a weird undocumented state by omitting all the associated configuration silently. - We could pass
spawner.get_args()no matter what, but then we probably introduce a breaking change for someone or start flooding their startup script with weird command line flags. - Ideas:
- to toggle passing get_args() by another logic than if
cmdis set or not - to always pass get_args() no matter if
cmdis set or not
- to toggle passing get_args() by another logic than if
- Not setting
About spawner.get_args()
Note that the args considered by get_args() are as defined in JupyterHub's spawner.py:
| Configuration | Resulting argument |
|---|---|
spawner.ip |
--ip=... |
spawner.port |
--port=... |
spawner.default_url |
--SingleUserNotebookApp.default_url=... |
spawner.notebook_dir |
--notebook-dir=... |
spawner.debug |
--debug |
spawner.disable_user_config |
--disable-user-config |
spawner.args |
<user defined args> |
Action points
- [ ] Add
entrypointconfiguration (@consideRatio works on this) - [ ] Update docs etc to bring more clarity to
cmd,entrypoint, andargs(@consideRatio works on this) - [ ] Clarify how to resolve point 2 (@manics @minrk could you help deliberate about this?)
- [ ] Resolve point 2
Related
- #502 - updated
cmdconfig documentation
A non-breaking way to achieve (2) could be to treat the empty string as a special value, so cmd = null preserves the existing behaviour, and cmd = "" means add all args?
On the other hand if we go for a breaking change I think either we should keep the ability to not pass additional arguments, or clearly document the command arguments interface that should be supported.
Originally jupyter-notebook was the only client, but now we have jupyter-lab, jupyter-server, and people have written their own alternatives such as https://discourse.jupyter.org/t/new-package-to-run-arbitrary-web-service-in-jupyterhub-jhsingle-native-proxy/3493 (which is also one of the motivations for https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/2138)
A non-breaking way to achieve (2) could be to treat the empty string as a special value, so cmd = null preserves the existing behaviour, and cmd = "" means add all args?
Hmmm... I think we should opt for a potentially breaking change to resolve the future needs properly.
- Ability to configure
ENTRYPOINTandCMDvery clearly without interference fromget_args() - Ability to opt-in/opt-out of augmenting
get_args()
I realize now that we can't augment get_args() to the Dockerfile's defined CMD because it would simply be overridden entirely, and that is probably the logic for now always passing get_args().
So... setting spawner.cmd will override the Dockerfile's CMD, so get_args() won't augment with it and would often not make sense unless something like jupyterhub-singleuser was defined in the ENTRYPOINT.
Also... you may want to configure spawner.cmd without ending up having get_args() augmented.
I think what we must support is opt-in/out-out of get_args(), and this can be in three modes:
- always append if
cmdis set (current KubeSpawner behavior) - always append to
cmdindependent if its set or not (Dockerfile CMD will never be considered) - never append to
cmdindependent if its set or not
I wonder if perhaps the Spawner base class should add some configuration about this though rather than KubeSpawner, and then we make use and adjust around that? The current behavior in KubeSpawner is KubeSpawner specific, and I think the Spawner base class current behavior is to always append get_args() to cmd.
It is extremely confusing that Kubernetes renames Docker's Entrypoint and Command as Command and Args, respectively!
While Kubernetes' names more accurately describe what happens (a command is passed arguments), I think the docker names more accurately describe what they are for (an entrypoint prepares an environment in which to launch a command).
In that sense, cmd and args are used to build a single command list, which we pass to kubernetes container arguments, just like we would with docker run $image command ... and I think we should never interact with the entrypoint at all, other than requiring that it allow arbitrary commands.
This is part of why I think things like docker stacks should implement their environment customization in entrypoint instead of CMD. Overriding CMD shouldn't be hugely consequential, but it is for docker-stacks.
I think what we must support is opt-in/out-out of get_args(), and this can be in three modes:
- always append if cmd is set (current KubeSpawner behavior)
- always append to cmd independent if its set or not (Dockerfile CMD will never be considered)
- never append to cmd independent if its set or not
I'd add that option 2 would also require restoring the default cmd = 'jupyterhub-singleuser' because it doesn't make sense to append args without starting with the command itself. This would also make KubeSpawner consistent with all the non-container-based Spawners.
I wonder if perhaps the Spawner base class should add some configuration about this though rather than KubeSpawner, and then we make use and adjust around that?
I think this is really a KubeSpawner-specific issue, and should probably be a kubespawner option. It is really only the "get the default command from the image with kubespawner" case that is affected, because kubespawner cannot actually retrieve this info from the image. If the command is specified, everything works.
The Spawner API is that cmd ultimately launches jupyterhub-singleuser, and args will be passed to it. It is part of the API that this command accepts the CLI args of jupyterhub-singleuser, even if it may be a wrapper or adapter. Launching a command other than that is not currently supported by JupyterHub. What makes KubeSpawner (and DockerSpawner) unique and require special handling is that they have another source for the default command: the CMD field of the image itself. Dockerspawner deals with this by inspecting the image to extract CMD before appending arguments, if Spawner.cmd is unset. I don't know how feasible that would be in KubeSpawner (we'd have to deal with all kinds of registry and pull secret stuff to get it, I think). So I'm not sure what we can do to support that here. Maybe this means using the image's CMD by default is not something that's feasible in KubeSpawner.
This is related to why I opened https://github.com/jupyterhub/jupyterhub/pull/3381 - it would be nice if we could stop specifying CLI args (by default) in Spawners, and do everything through the env, but it's trickier to remove than I realized. I would like to get to a point where JupyterHub internally does not turn any options into CLI args, other than explicit CLI args from user config, and only communicates options through the environment. Then this problem would go away, I think...
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:
https://discourse.jupyter.org/t/kubespawner-and-ldapauthentication-run-under-users-ldap-uid/8988/7
I think this issue is mostly resolved by the fact that jupyterhub 2.0 does not specify any CLI arguments, nor place any requirements on the CLI signature of the launch command, and never will again. That means that Spawner.args no longer needs to conform to jupyterhub-singleuser, it only needs to conform to whatever the user chooses for cmd. A key point is that there's no difference between specifying singleuser.cmd and singleuser.args vs just concatenating the two in singleuser.cmd. In fact, given that the only case that doesn't work is specifying singleuser.args when using the command from the image (i.e. singleuser.cmd unspecified), arguably with KubeSpawner one should always specify singleuser.cmd as the full list with any args they want to pass, and can safely ignore singleuser.args.
This simplifies the configuration explanation for z2jh in 2.0:
If you want to specify the command to launch, there are two possibilities:
- (default) specify nothing to use the image's default command, which is assumed to eventually launch
jupyterhub-singleuser, or- customize the launch command and any arguments, via
Spawner.cmd, for example:c.Spawner.cmd = ["jupyterhub-singleuser", "--some-arg=5", "..."]Customizing only the arguments without specifying the command to launch (i.e. setting
Spawner.argswithoutSpawner.cmd) is not supported.
Thank you for clarifying things @minrk! I'm not yet comfortable with the situation about cmd in Spawner, KubeSpawner, and z2jh and consider if we should make some change before z2jh 2.0.0 about that.
Renewed overview
- [x] The
default_urlsituation is resolved if we assume use of jupyterhub 2.3.1+ - [x] The
argsandget_args()situation in KubeSpawner is acceptable, we provide a warning ifargsis ignored becausecmdisn't set.get_args()implementation in jupyterhub's spawner.pyget_pod_manifest()in KubeSpawner combinescmdandget_args()ifcmdis set and warns ifargsis set butcmdisn't. The resulting command will override a Dockerfile'sCMD.
- [ ] The
cmdsituation isn't feeling great:cmddefined in jupyterhub's spawner.py, defaults to["jupyterhub-singleuser"].cmdre-defined in kubespawner's spawner.py, defaults to None.cmdconfiguration in z2jh's values.yaml, defaults to None.- z2jh's jupyterhub_config.py doesn't set
c.Spawner.cmdwhensingleuser.cmdis set to None.- z2jh's configuration could benefit from another state to track if
cmdisn't to be set vs if it is to be set with any acceptable value, such as None or[].
- z2jh's configuration could benefit from another state to track if
Brainstorming
-
Do we want KubeSpawner to override
cmd?I'm thinking using
["jupyterhub-singleuser"]as a defaultcmdcould be fine, and removing the override in kubespawner would reduce some complexity perhaps?If we do this though, we need z2jh to be able to set the command to
Noneand[]respectively, which both are different. I think the way to do that is via a new helperset_config_if_defined()in z2jh's provided utility functions for use by z2jh's jupyterhub_config.py. -
Why do we set
c.Spawner.cmdandc.Spawner.default_urlinstead ofc.KubeSpawner.cmdandc.KubeSpawner.default_urlin z2jh's jupyterhub_config.py? It seems weird that KubeSpawner overridescmd, but we setc.Spawner.cmd. -
Is there an action point to take pre z2jh 2.0.0 release?
Overall, I hope to resolve this once and for all and don't mind doing work to make changes if we can come up with changes that makes sense to make.
Related
Do we want KubeSpawner to override cmd?
The default is less important to me; what's important to me is that the "use the image's command" case is clear and simple. If we can accomplish that, however we accomplish that, I'm okay with switching back to jupyterhub-singleuser to match other Spawners.
Whatever we do with the default, I think it's probably still better for the z2jh approach to specify either the full singleuser.cmd, or whatever special value (None, empty list, '$imageCommand'), but not expose (via documentation / schema) Spawner.args due to the fact that it only works when singleuser.cmd is also specified.
Why do we set c.Spawner.cmd and c.Spawner.default_url instead of c.KubeSpawner.cmd and c.KubeSpawner.default_url in z2jh's jupyterhub_config.py?
It doesn't make a difference in our case, but it's typical to set config either on the class that defines the trait, or on the subclass you know you are using. It makes no difference when there is only one class you are configuring, but if another class came up that was a sibling of KubeSpawner (inherits from Spawner but not KubeSpawner), there is a distinction:
- Setting it on Spawner would affect both KubeSpawner and CustomSpawner
- Setting on KubeSpawner would only affect KubeSpawner, not CustomSpawner
That's mostly hypothetical in the current situation where there's really only one class to configure (or possible custom subclasses of KubeSpaner), so if things are clearer to set them on KubeSpawner only, that's okay, too. I would typically apply config to the class that defines the trait, though.
Traitlets config loading looks through the class inheritance, so it'll find it on Spawner, KubeSpawner, or anything in between that has the trait (if there were something in between).
Is there an action point to take pre z2jh 2.0.0 release?
I guess we need to decide if we want to revert the "use image by default" change. If not, then document the two choices (singleuser.cmd or 'default').
If we go back to jupyterhub-singleuser, then we need to revert the change, and make sure to implement and document the 'use the image' option, since that not working is what started the whole process.