nebari
nebari copied to clipboard
[BUG] - conda environments fail to build
OS system and architecture in which you are running QHub
Ubuntu on GCP
Expected behavior
Creating a conda environment in the filesystem
namespace (from the qhub-config.yaml
) or my personal namespace should build my environment (provided that it is a valid env).
Actual behavior
When a submitting a conda environment (in the filesystem
namespace or in my personal namespace), it will fail to build with the following error message:
(example of build failing in the filesystem
ns)
Looking for: ['python==3.9.13', 'ipykernel==6.15.1', 'ipywidgets==7.7.1', 'qhub-dask==0.4.3', 'param==1.12.2', 'python-graphviz==0.20.1', 'matplotlib==3.3.2', 'panel==0.13.1', 'voila==0.3.6', 'streamlit==1.10.0', 'dash==2.6.1', 'cdsdashboards-singleuser==0.6.2']
Preparing transaction: ...working... failed
CondaError: Unable to create prefix directory '/home/conda/filesystem/7f7f767440c1987bc8eeacb1741b638c71c44f30ffb25d9e0503b6f2f4d9fe11-20220819-012441-874213-109-cds'.
Check that you have sufficient permissions.
How to Reproduce the problem?
Build any valid conda env from the conda-store endpoint or by adding it to the qhub-config.yaml
, and it will fail to build.
Command output
No response
Versions and dependencies used.
qhub version: v0.4.4rc3
conda-store version: v0.4.9
or v0.4.11
Compute environment
No response
Integrations
No response
Anything else?
No response
@iameskild this has to do with a change that I made in the container default uid/gid. I'll provide a fix tomorrow morning
@costrouc @viniciusdc moving our slack conversation here for posterity.
CO: Issue is that conda-store in roughly 0.4.5+ now runs as user 1000 and not 0. So it no longer has
permissions in that folder. Not sure what the best route is. conda-store long term should not be running
as root. I might chmod + chown that directory for conda-store
VC: I would say that long term each namespace/environemt should use a permission uuid based on
keycloak permission system (though that might be a lot harder). For now, some kind of auto migration
system from conda-store itself to move any environments and update its permission would work right?
VC: > chmod + chown that directory for conda-store
Could we have a conda-store group, is that feasible? then we don't need to worry about user permissions
I think it makes sense to restrict the conda-store's permissions.
As for how to go about ensuring we this isn't a breaking change, could we add an initContainer
as follows to the conda-store worker deployment:
initContainers:
- command:
- /bin/chown
- -R
- "1000:1000"
- /home/conda
image: busybox:latest
name: chmod-er
securityContext:
privileged: true
volumeMounts:
- mountPath: /home/conda
mountPropagation: None
name: storage
I've tested this today on quansight-beta.qhub.dev and it does appear to correctly change the permissions for the existing files/folders under /home/conda
:
drwxr-xr-x 13 1000 1000 4096 Aug 22 23:03 [email protected]
However I run into another permissions issue whenever I try to create a new env. The "default" gid still appears to be root
:
drwxrwxr-x 13 1000 root 4096 Aug 22 23:04 6199e7747550f21efc268c887c71da3fc46117fe8f3b82876b2cfdfb14db7020-20220822-230259-522581-122-eae_test_5
Then when conda-store tries to change ownership, the following issue arises:
Logs from the conda-store-worker
:
chown: changing ownership of '/home/conda/[email protected]/6199e7747550f21efc268c887c71da3fc46117fe8f3b82876b2cfdfb14db7020-20220822-230259-522581-122-eae_test_5': Operation not permitted
2022-08-22 23:04:15,296: WARNING/ForkPoolWorker-2] [CondaStoreWorker] ERROR | Command '['chown', '-R', '1000:1000', '/home/conda/[email protected]/6199e7747550f21efc268c887c71da3fc46117fe8f3b82876b2cfdfb14db7020-20220822-230259-522581-122-eae_test_5']' returned non-zero exit status 1.
I was able to get around this by adding fsGroup: 1000
to the pod's securityContext
:
securityContext:
fsGroup: 1000
The above solution works when updating existing deployments but fails when new users sign in and for fresh deployments. Although the deployment scripts complete successfully, the trouble is that new conda envs can't be created due to permissions issues. This is due to how the initContainers
(added by the KubeSpawner) set the permissions for the mounted volumes (specifically the conda-store-mount), see here.
Changing this permission to anything other than root
will then break existing deployments. A solution might be to add another initContainer
which correctly sets the permissions for all files/folders in the /home/conda
before the others are called initContainers
are run. The last hurdle for this solution is making sure that this new initContainer
is the first one that is executed.