milvus-helm
milvus-helm copied to clipboard
Cannot run Milvus in our Openshift cluster with (runsAsUser, runAsGroup, fsGroup) - How to remove them??
Hi Team,
We tried to use your helm templates (https://github.com/zilliztech/milvus-helm/tree/master/charts/milvus) and tried to deploy milvus on our openshift cluster, Our openshift team or Kubernetes cluster admins won't let us specify any security context i.e(runsAsUser, runAsGroup, fsGroup, ) for pods/deployments/replicasets/statefulsets, So we should not be specifying the below.
# runAsUser: 1000
# runAsGroup: 1000
# fsGroup: 1000
So I had to comment them and tried to milvus install. But it does not work
None of my pods start and I see the following errors.
Please assist, how to proceed further.
Thanks! Tharun M
I thought you used MinIO as the local storage?
Milvus doesn't set this securityContext. It should be those third party Charts such as MinIO, ETCD, Pursar, etc. that sets these up, they use this to avoid root execution to make it more secure.
I found the settings in minio/values.yaml
## Add stateful containers to have security context, if enabled MinIO will run as this
## user and group NOTE: securityContext is only enabled if persistence.enabled=true
securityContext:
enabled: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
You could use the openshift cloud storage to store the data. Disable the MinIO and set the external bucket in your values. Can have a try
minio:
enabled: false
externalS3:
enabled: true
host: xxxxx
port: 443
rootPath: xxxxx
bucketName: xxxx
cloudProvider: xxx
useSSL: true
accessKey: "xxx"
secretKey: "xxx"
Or you can find the related settings of minIO.persistence in values.yaml. Disable it and have a try.
find more info about MinIO settings in the minio chart (https://github.com/zilliztech/milvus-helm/blob/master/charts/milvus/charts/minio-8.0.17.tgz)
Hi, Thanks for your response.
I can try that method for third party components, but before that, How do I get milvus components working ?
In these deployments for each one of those components (datacoord, datanode,indexcoord,indexnode,proxy,querycoord,querynode,rootcoord) looks like you are using 2 images.
- For Init container - milvusdb/milvus-config-tool:v0.1.2
- For Actual /main container - milvusdb/milvus:v2.3.3
In our case, none of these 8 Milvus specific deployments/pods are starting and looks like each one of them have exact same error. 2023/12/21 14:55:03 write failed: open /milvus/configs/milvus.yaml: permission denied
. See screenshots.
### Overall:
Data Coord
Index Coord
Indexnode
Datanode
Proxy
QueryCoord
Querynode
rootCoord
Upon further investigation I see that the config-tool image has uid/gid baked in 65532:65532 ?
Can you help me what I can further do here to get these pods into Running state ?? Please note our openshift cluster won't let us run containers as any specific User or wont like us to have some user/group 65532:65532 baked into images.
Also, Can you eloborate if its hardcoded in config-tool image or Milvus images ?? ? Why am I getting permission denied on each of those pods ??
Thanks for your support. Tharun
Could you write down your steps? For example, which files you modified, and which methods you used to deploy them, so that I can better understand your changes.
I have also faced the same issue with OpenShift cluster, i managed to get some of these issues sorted out by adding a service account with extra SCC for running as root, however these 3 rd party providers as well they require to be run as root as well. Is there a way we could get around them as well? I was trying to override some of these helm chart manually but having some trouble.
There are multiple things to adjust for Milvus to deploy properly on OpenShift:
- Milvus container image needs some modification to make the folder /milvus writable by gid 0. I just made a PR for that. If you want to test, a container image implementing this change is available here.
- Security Contexts have to be fixed for Pulsar and a few other deployments or statefulsets.
- Pulsar must be updated to version 2.10.5. The current version, 2.8.2, cannot run as non-root.
- The port used used by Pulsar for Prometheus, 80, cannot be bound on OpenShift, it must be changed to a higher one, like 8080.
I generated a manifest for a full cluster deployment, did the modifications, and generated a diff file (attached, renamed to txt to allow upload) to reflect all those changes. From there, if it's ok with you @wyfeng001 , or whoever is in charge, we could look into incorporating all those changes in the chart, or create specific one for OpenShift. milvus_manifest.diff.txt
Hi @guimou! Thank you very much for providing a solution!
I just check the manifest.diff. The default milvus image will be updated after your PR for milvus is merged and released.
It's unlikely that we change the default pulsar version within a short time, because v2.8.2 is quite stable and is sufficient for current milvus version, and 2 major version upgrade maybe too agressive. However we can maintain a pulsar image repo for milvus to solve this, like we did for the etcd image: https://github.com/milvus-io/bitnami-docker-etcd/.
/assign /assign @LoveEachDay
@haorenfsa Yeah, I did not like updating Pulsar version, but to the extent of the limited tests I did it worked! :smile: The changes were introduced only with version 2.10.0, with this PR. Would your suggestion be to fork the pulsar repo from 2.8.2 and apply those changes (if feasible without impact)? Then rebuild and use this patched version of Pulsar for OpenShift deployments?
@guimou Yes, in fact it's not the version of pulsar we need to change, but the Dockerfile that the pulsar image is built with.
However, I just found that the Milvus community is considering upgrading pulsar to 3.0 due to this issue https://github.com/apache/pulsar/issues/14779, which may influence the stability of Milvus significantly.
I'll keep updating the future plan of Milvus community here, based on which we'll then decide whether we need to fork & patch pulsar 2.8.2 to solve this in the short term.
Meanwhile, for people still struggling, a full recipe to deploy Milvus (Standalone or Cluster-mode) on OpenShift is available here (as well as example notebooks for ingestion/query, RAG chatbot recipes,... in the same repo).
However, I just found that the Milvus community is considering upgrading pulsar to 3.0 due to this issue https://github.com/apache/pulsar/issues/14779, which may influence the stability of Milvus significantly.
looks like this's not gonna happen within a few month. Let's considering polishing current pulsar v2.8.2 docker image instead.
Things to be done from my rough estimation:
- [ ] maintaining a Dockerfile to polishing pulsar image & push to milvusdb/pulsar
- [ ] update milvus-helm's default image to milvusdb/pulsar