nebari icon indicating copy to clipboard operation
nebari copied to clipboard

Unable to change user InitContainers configured when launching JupyterLab

Open samuel-co opened this issue 6 months ago • 5 comments

Context

I am unable to customize the InitContainers run when a user launches their JupyterLab instance. This is a problem because we are running into the docker rate limit in our network, and cannot change the source of the busybox:1.31 image being used by said InitContainers.

The InitContainers are currently hardcoded within Nebari, specifically in the Jupyterhub configuration here. We cannot update the image source through the Kubernetes API, as the InitContainers are set by the linked python script when a user launches their Jupyterlab instance. We cannot pre-stage the image on our nodes because the launch_template configuration is unavailable. We cannot override the InitContainer images using kubespawner_overrides configuration, as it doesn't merge specified init_containers with the python-defined ones, resulting in duplicate InitContainer names that break the launch.

Any advice on how to update the image source of the JupyterLab InitContainers would be greatly appreciated. We run Nebari on a shared internal network, so we hit the docker rate limit very commonly with our limited number of NAT IP addresses. Thanks!

Value and/or benefit

Having the ability to customize where these containers are pulled from would allow us to source images from trusted, and non rate-limited, sources.

Anything else?

The ability to use Nebari's launch_templates configuration within AWS would allow us to mitigate the issue, as we could stage AMIs with images already installed, or download the images from other locations using the pre_bootstrap_script. However, it appears the launch_template configuration is not supported, as including it in the config results in the following error:

$ nebari validate --config nebari-dev-config.yml
ERROR validating configuration 
nebari-dev-config.yml
1 validation error for ConfigSchema
amazon_web_services.node_groups.user
  Value error, The 'launch_template' field is currently unavailable and has been
removed from the configuration schema.
Please omit this field until it is reintroduced in a future update. 
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
Aborted.

samuel-co avatar Jul 16 '25 23:07 samuel-co

Hey @samuel-co, sorry for the late reply. Regarding your question. Currently, there is no user-facing method to override the busy-box images. I don't mind this being included as part of the overrides, so a PR is welcome.

If you want to override that right now, there are two ways of doing so, though each one is not a persistent change:

  • Under your Kubernetes Nebari cluster, you have a hub secret that contains all the data that is used for generating the "run-time" manifest for the user JupyterLab, including the initContainers. You can download that secret, decode it, and update it, then push it again using kubectl.
  • On the other hand, you can update your /stages/07-kubernetes-services/modules/kubernetes/services/jupyterhub/files/jupyterhub files, and then during deployment, you can use
nebari deploy -c nebari-config.yaml --disable-render

this will allow you to override anything in terms of the code structure, though keep in mind that when passing that extra flag some changes made to the nebari-config yaml might not show up without rendering.

viniciusdc avatar Jul 22 '25 14:07 viniciusdc

Regarding the AWS launch_template that has a BUG right now that needs to be fixed before this flag can be used again https://github.com/nebari-dev/nebari/issues/2832

viniciusdc avatar Jul 22 '25 14:07 viniciusdc

Thanks for the response @viniciusdc!

I tried following the first method (downloading the hub secret, updating it, and pushing it back up), and had a question. Where in the hub secrets config would you suggest setting the JupyterLab instances initContainers? I already have the following configuration in the hub secret from using the kubespawner_overrides in the configuration file. e.g., in the decoded hub secret, I see the following (changed the AWS Account ID to 123456789012):

singleuser:
  image:
    name: 123456789012.dkr.ecr.us-west-2.amazonaws.com/nebari-jupyterlab
    pullPolicy: null
    pullSecrets: []
    tag: 2025.2.1
  initContainers:
  - image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/busybox:1.31
    name: initialize-home-mount
  - image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/busybox:1.31
    name: initialize-shared-mounts
  - image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/busybox:1.31
    name: initialize-conda-store-mounts

Problem is, with that configuration in the secret, launching a new Jupyterlab instance causes the spawner to create a pod with those 3 initContainers and the same 3 named initContainers defined by Nebari. This throws an error in immediately because of the duplicate named containers. e.g., it appears Nebari appends the 3 initContainers defined in this file to the initContainers defined in the hub secret, instead of overwriting or merging them.

Is there a different location in the hub secret I should be overriding the initContainers?

samuel-co avatar Jul 29 '25 22:07 samuel-co

uhm... I guess that might be due to how they are being merged. I gues the best course of action woudl be to include an overrides attr to the nebari-config.yaml schema to allow the ability to override the init containers. Can you try having a look at Jupyterhub Overrides specifically:

jupyterhub:
  overrides:
    hub:
      initContainers:
      - image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/busybox:1.31
        name: initialize-home-mount
      - image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/busybox:1.31
        name: initialize-shared-mounts
      - image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/busybox:1.31
        name: initialize-conda-store-mounts

Might do the trick, since we use deep_merge at the end of the file: https://github.com/nebari-dev/nebari/blob/4679c3a640e34704e8974daae782b5f44a9171e3/src/_nebari/stages/kubernetes_services/template/modules/kubernetes/services/jupyterhub/files/jupyterhub/03-profiles.py#L534-L547 to pinpointly allow overrides.

viniciusdc avatar Aug 01 '25 00:08 viniciusdc

Apologies, I mistyped in my last comment. I meant to say using the overrides field, not kubespawner_overrides.

Using the following config is what results with the initContainer definition appearing in the hub secret, but not merging with the existing initContainer config. Note, you have to override the singleuser settings. Overriding the hub settings just adds these initContainers to the Hub pod, which isn't necessary and results in a different set of errors:

jupyterhub:
  overrides:
    singleuser:
      initContainers:
      - image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/busybox:1.31
        name: initialize-home-mount

To further test the deep_merge, I broke down the 03-profiles.py file to the very basics to see see how the deep_merge functions with the override values. Uploaded the file below, but you can see the resulting output does not merge the items inside the initContainers lists, it just joins the two lists together. e.g., passing in the above override results in the following initContainers definition when a user node is launched (shortened to test with a single container for clarity). Because the list contains two separate containers with the name initialize-home-mount, the pod fails to initialize.

...
  "init_containers": [
    {
      "name": "initialize-home-mount",
      "image": "busybox:1.31",
      "command": [
        "sh",
        "-c",
        "mkdir -p /mnt/home/samuel-co && chmod 777 /mnt/home/samuel-co && find /etc/skel/. -maxdepth 1 -not -name '.' -not -name '..*' -exec cp -rL {{}} /mnt/home/samuel-co \\;"
      ],
      "securityContext": {
        "runAsUser": 0
      },
      "volumeMounts": [
        {
          "mountPath": "/mnt/home/samuel-co",
          "name": "home",
          "subPath": "home/samuel-co"
        },
        {
          "mountPath": "/etc/skel",
          "name": "skel"
        }
      ]
    },
    {
      "name": "initialize-home-mount",
      "image": "123456789012.dkr.ecr.us-west-2.amazonaws.com/busybox:1.31"
    }
  ]
...

Here's the shortened 03-profiles.py file I used to test the deep_merge, test-merge-03-profiles.py.txt

I unfortunately don't see an easy way to update the deep_merge behavior to actually check contents of resources inside lists and merge those resources accordingly, especially because there'd have to be some sort of prioritization between original and override keys. Do you have any ideas on how we could better propagate those overrides into the launch config?

samuel-co avatar Aug 01 '25 15:08 samuel-co