enterprise_gateway
enterprise_gateway copied to clipboard
Document how to use the conditional volume mounting logic in kernel pod template
One of the features added in PR #629 was the ability for users to specify per-user volume mounting by specifying KERNEL_VOLUME_MOUNTS
and KERNEL_VOLUMES
values in the env
stanza of the api/kernels POST request's json body. We need to document an example of how that would be accomplished. It could be as simple as taking the example from #629
This issue came up during the review of PR #749 - which references the location within the docs.
@kevin-bates any updates on this?
I'm trying to mount a volume to the dynamically generated kernel pod.
The desired mountPath
would ideally be dynamic.
Trivial Example:
- userA requests for a kernel and gets a mountPath /some/path/userA
- userB requests for a kernel and gets a mountPath /some/path/userB
How do I go about doing something along these lines?
I've already taken a look at #629 but still unsure as to how to proceed.
You mentioned here that we can mount volumes...
by specifying KERNEL_VOLUME_MOUNTS and KERNEL_VOLUMES values in the env stanza of the api/kernels POST request's json body
are these REST API's with the env
fields documented anywhere at the moment?
I'm currently playing with the enterprise-gateway.yaml
for kubernetes: https://github.com/jupyter/enterprise_gateway/blob/master/etc/kubernetes/enterprise-gateway.yaml
Thanks!
Hi @pakatagoh - I apologize for the lack of documentation. Frankly, I haven't used the volume mount capabilities that often and have been relying on the community to contribute their findings back, which hasn't been as graceful as I'd like. Nonetheless, let's try to get you unblocked.
I'm currently playing with the enterprise-gateway.yaml for Kubernetes
The template you should be looking at is the kernel-pod.yaml.j2
file that gets placed into each kernelspec's scripts
directory when we assemble our example kernelspec offerings. The intention here is to enable admins to configure each kernelspec's pod specific to that kernel. Some may use GPUs and require appropriate resource limits, etc. while others not so much. The enterprise-gateway.yaml
file is used to deploy the EG server itself.
are these REST API's with the env fields documented anywhere at the moment?
No - and it probably should be. I can give a brief overview.
If you're using a Notebook or Lab client (that has been launched using --gateway-url
), then all environment variable prefixed with KERNEL_
will be added into the env
stanza of the body. In addition, any environment variables named in the GatewayClient.env_whitelist
configurable or via the JUPYTER_GATEWAY_ENV_WHITELIST
environment variable (comma-separated list of env var names) will also have their name/value pairs added to the env body in the kernel's start request.
If you're using your own client and building the REST API call yourself, you're free to add any envs you like.
What envs are accepted by EG is also a function of configuration. EG will accept all KERNEL_
envs. Any others must be specified via EG's env_whitelist
configurable. Please note that the 2.4.0 release (as of yesterday), allows a value of '*'
to be specified that allows all values in the env body to be passed along to the kernel.
When KERNEL_
envs are applied to the kernel-pod.yaml.j2
template, they will have been lowercased. So any template variables like kernel_volume_mounts
and kernel_volumes
correspond to envs KERNEL_VOLUME_MOUNTS
and KERNEL_VOLUMES
, respectively.
Also note that if the only variation is the user's name - which should be represented by KERNEL_USERNAME
, you could configure volume mounts with the only variable being {{ kernel_username }}
.
You're free to adjust the kernel-pod.yaml.j2 contents however you like for each of your kernel specs.
I hope this helps.
@kevin-bates No need for the apologies. Thanks for the help so far! Since I'm very new to the Jupyter ecosystem, learning and creating the mental model on how things work is surely a challenge.
The template you should be looking at is the kernel-pod.yaml.j2 file that gets placed into each kernelspec's scripts directory when we assemble our example kernelspec offerings. The intention here is to enable admins to configure each kernelspec's pod specific to that kernel. Some may use GPUs and require appropriate resource limits, etc. while others not so much. The enterprise-gateway.yaml file is used to deploy the EG server itself.
Understood on this paragraph above. So far, I was able to deploy the EG, and sending a kernel request to the EG will spin up a new pod based on the kernel-pod.yaml.js
file and any ENVs passed in the json body. I can see that happening using kubectl
Below was the request sent to EG. Didn't realize that KERNEL_VOLUMES
and KERNEL_VOLUME_MOUNTS
needed their values to be strings.
If you're using your own client and building the REST API call yourself, you're free to add any envs you like.
Yes my team and I are currently using our own client but with the help of the @jupyterlab/services
package. Would you be familiar with this package? Because we're currently exploring and making sure we can send these dynamic env's via this package. Will currently be working on sending these env's programmatically and hopefully there's a way to do so using this package from @jupyterlab
Would you be familiar with this package? Because we're currently exploring and making sure we can send these dynamic env's via this package. Will currently be working on sending these env's programmatically and hopefully, there's a way to do so using this package from @jupyterlab
I'm somewhat familiar with @jupyterlab/services
but only to the extent of looking into issues (like https://github.com/jupyterlab/jupyterlab/issues/8013).
I would imagine lab (like the notebook frontend) would go through "session" to start a kernel. However, if the target of @jupyterlab/services
is EG, the sessions handler will come directly from notebook (since EG only inherits those handlers) and the session handler's create_session
method would drop the env
key in the request's body.
In scanning both the kernel and session sub-packages, I don't see anywhere that env
is exposed for setting into the body. I guess since notebook's kernel start logic also doesn't rely on an entry for `env', guess I'm not surprised (unlike EG's kernel start logic).
Looking at the image above, you should not send KERNEL_NAME
in the request. It will be set by EG based on the "name"
property and is primarily for troubleshooting (IIRC).
It looks like a python_kubernetes
kernel was launched. However, I suspect because you probably hit the notebook's POST handler, none of the envs were conveyed and you're essentially using defaults. For example, is your namespace in which the kernel pod resides of the form jovyan-<kernel-id>
? Since you conveyed a KERNEL_USERNAME
value of pakata-username
the kernel's namespace should be pakata-username-<kernel-id>
(when the envs flow) unless you've set EG_SHARED_NAMESPACE=true
during deployment.
I don't know if lab's services package exposes a kernel start capability that hits /api/kernels
directly (and doesn't go through /api/sessions
), but that's what you need to do (in addition to plumbing env support into the kernel start request's body).
What other "services" from that package are you needing? The thought being that if only kernel
(and kernelspec
) you could probably be more direct. However, if you need things like contents
, then session
may be what ties things together in the frontend.
Below is the current enterprise-gateway deployment in the yaml file
apiVersion: apps/v1
kind: Deployment
metadata:
name: enterprise-gateway
namespace: enterprise-gateway
labels:
gateway-selector: enterprise-gateway
app: enterprise-gateway
component: enterprise-gateway
spec:
# Uncomment/Update to deploy multiple replicas of EG
# replicas: 5
selector:
matchLabels:
gateway-selector: enterprise-gateway
template:
metadata:
labels:
gateway-selector: enterprise-gateway
app: enterprise-gateway
component: enterprise-gateway
spec:
# Created above.
serviceAccountName: enterprise-gateway-sa
containers:
- env:
- name: EG_PORT
value: "8888"
# Created above.
- name: EG_NAMESPACE
value: "enterprise-gateway"
# Created above. Used if no KERNEL_NAMESPACE is provided by client.
- name: EG_KERNEL_CLUSTER_ROLE
value: "kernel-controller"
# All kernels reside in the EG namespace if True, otherwise KERNEL_NAMESPACE
# must be provided or one will be created for each kernel.
- name: EG_SHARED_NAMESPACE
value: "True"
# NOTE: This requires appropriate volume mounts to make notebook dir accessible
- name: EG_MIRROR_WORKING_DIRS
value: "True"
# Current idle timeout is 1 hour.
- name: EG_CULL_IDLE_TIMEOUT
value: "3600"
- name: EG_LOG_LEVEL
value: "DEBUG"
- name: EG_KERNEL_LAUNCH_TIMEOUT
value: "60"
- name: EG_KERNEL_WHITELIST
value: "'r_kubernetes','python_kubernetes','python_tf_kubernetes','python_tf_gpu_kubernetes','scala_kubernetes','spark_r_kubernetes','spark_python_kubernetes','spark_scala_kubernetes'"
- name: EG_DEFAULT_KERNEL_NAME
value: "python_kubernetes"
- name: EG_ENV_WHITELIST
value: "['KERNEL_NAME','KERNEL_VOLUMES', 'KERNEL_VOLUME_MOUNTS']"
# Ensure the following VERSION tag is updated to the version of Enterprise Gateway you wish to run
image: elyra/enterprise-gateway:2.4.0
# Use IfNotPresent policy so that dev-based systems don't automatically
# update. This provides more control. Since formal tags will be release-specific
# this policy should be sufficient for them as well.
imagePullPolicy: IfNotPresent
name: enterprise-gateway
ports:
- containerPort: 8888
Looking at the image above, you should not send KERNEL_NAME in the request. It will be set by EG based on the "name" property and is primarily for troubleshooting (IIRC).
Noted on the KERNEL_NAME
you mentioned above. I've altered the request body as such:
The image below shows the output from kubectl get pods -A
after sending the above POST request. It created a pod with my username. But under the enterprise-gateway
namespace. Is this the right behaviour? I suppose so since the current setup that I have has EG_SHARED_NAMESPACE
to be True
.
And yes, sending env
s under the api/sessions
POST request body doesn't work as per your explanation above about session handlers not making use of envs. See below for the request sent in
The output. the jovyan
user was used.
Regarding the following:
I don't know if lab's services package exposes a kernel start capability that hits /api/kernels directly (and doesn't go through /api/sessions), but that's what you need to do (in addition to plumbing env support into the kernel start request's body).
I believe it does, though the type's for env
s aren't supported when using typescript. Had to ignore it to stop ts from screaming at me (KernelManager and SessionManager are imports from @jupyterlab/services
)
Below is a snippet from a node.js websocket server and on connection to the server, a new kernel will be created.
Here is the output from k8 upon making a connection to this node.js server
Looking at the source, I guess I'm "lucky" that I could still pass env
as a field in kernel.startNew({createOptions})
as @jupyerlab/services
just throws the entire createOptions
object into the body of the /api/kernels POST request.
The same would go for session.startNew()
but since the handlers on EG drops the env's, the use of a session manager to start kernels won't play nicely with EG
I suppose using sessions would have been nice for this feature as mentioned here in the Jupyterlab services README
The primary usecase of a session is to enable persisting a connection to a kernel. For example, a notebook viewer may start a session with session path of the notebook's file path. When a browser is refreshed, the notebook viewer can connect to the same kernel by asking the server for the session corresponding with the notebook file path.
I suppose for now, I'll just be making use of just the KernelManager
I suppose the discussion has gotten a little off topic as well 😂.
For anyone looking to mount volumes on pods via env
s, see this comment above 👍🏻