milvus
milvus copied to clipboard
[Bug]: fail to start milvus with GCP as externalS3
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: 2.2.9
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): -
- OS(Ubuntu or CentOS): -
- CPU/Memory: -
- GPU: -
- Others: -
Current Behavior
Hello, Milvus Query Node couldn't authorize with externalS3
of cloudProvider: "gcp"
, and therefore not able to start. The rest of the components started properly, with no warnings or errors in the logs. I've looked through the Milvus codebase, but at the moment I realised it's a CGo issue, I decided to report it.
Milvus is deployed on the Kubernetes cluster. My kubernetes pods and services subnets: 10.236.64.0/18,10.236.0.0/18
I am deploying Milvus using helm-chart, here are my values.yaml
:
milvus:
cluster:
enabled: true
metrics:
enabled: true
serviceMonitor:
enabled: true
interval: "30s"
scrapeTimeout: "10s"
additionalLabels:
release: kube-prometheus-stack
queryNode:
replicas: 3
extraEnv:
- name: HTTPS_PROXY
value: http://proxy:3128
- name: HTTP_PROXY
value: http://proxy:3128
- name: NO_PROXY
value: milvus-etcd,10.236.64.0/18,10.236.0.0/18,.svc.cluster.local,localhost,127.0.0.1,kubernetes.default.svc
indexNode:
extraEnv:
- name: HTTPS_PROXY
value: http://proxy:3128
- name: NO_PROXY
value: milvus-etcd,10.236.64.0/18,10.236.0.0/18,.svc.cluster.local,localhost,127.0.0.1,kubernetes.default.svc
replicas: 3
dataNode:
extraEnv:
- name: HTTPS_PROXY
value: http://proxy:3128
- name: NO_PROXY
value: milvus-etcd,10.236.64.0/18,10.236.0.0/18,.svc.cluster.local,localhost,127.0.0.1,kubernetes.default.svc
replicas: 3
minio:
enabled: false
indexCoordinator:
extraEnv:
- name: HTTPS_PROXY
value: http://proxy:3128
- name: NO_PROXY
value: milvus-etcd,10.236.64.0/18,10.236.0.0/18,.svc.cluster.local,localhost,127.0.0.1,kubernetes.default.svc
dataCoordinator:
extraEnv:
- name: HTTPS_PROXY
value: http://proxy:3128
- name: NO_PROXY
value: milvus-etcd,10.236.64.0/18,10.236.0.0/18,.svc.cluster.local,localhost,127.0.0.1,kubernetes.default.svc
externalS3:
enabled: true
bucketName: milvus-dev
host: storage.googleapis.com
port: 443
cloudProvider: "gcp"
useSSL: true
useIAM: false
rootPath: "milvus"
accessKey: <key>
secretKey: <key>
Expected Behavior
Query Node to be able to authorize with GCS
Steps To Reproduce
No response
Milvus Log
2023/06/07 18:22:59 maxprocs: Leaving GOMAXPROCS=32: CPU quota undefined
__ _________ _ ____ ______
/ |/ / _/ /| | / / / / / __/
/ /|_/ // // /_| |/ / /_/ /\ \
/_/ /_/___/____/___/\____/___/
Welcome to use Milvus!
Version: v2.2.9
Built: Fri Jun 2 09:38:35 UTC 2023
GitCommit: 9ffcd53b
GoVersion: go version go1.18.3 linux/amd64
open pid file: /run/milvus/querynode.pid
lock pid file: /run/milvus/querynode.pid
[2023/06/07 18:22:59.107 +00:00] [INFO] [roles/roles.go:226] ["starting running Milvus components"]
[2023/06/07 18:22:59.107 +00:00] [INFO] [roles/roles.go:152] ["Enable Jemalloc"] ["Jemalloc Path"=/milvus/lib/libjemalloc.so]
[2023/06/07 18:22:59.107 +00:00] [INFO] [management/server.go:68] ["management listen"] [addr=:9091]
[2023/06/07 18:22:59.120 +00:00] [INFO] [config/etcd_source.go:145] ["start refreshing configurations"]
[2023/06/07 18:22:59.121 +00:00] [INFO] [paramtable/quota_param.go:745] ["init disk quota"] [diskQuota(MB)=+inf]
[2023/06/07 18:22:59.121 +00:00] [INFO] [paramtable/quota_param.go:760] ["init disk quota per DB"] [diskQuotaPerCollection(MB)=1.7976931348623157e+308]
[2023/06/07 18:22:59.121 +00:00] [INFO] [paramtable/component_param.go:1543] ["init segment max idle time"] [value=10m0s]
[2023/06/07 18:22:59.121 +00:00] [INFO] [paramtable/component_param.go:1548] ["init segment min size from idle to sealed"] [value=16]
[2023/06/07 18:22:59.121 +00:00] [INFO] [paramtable/component_param.go:1558] ["init segment max binlog file to sealed"] [value=32]
[2023/06/07 18:22:59.121 +00:00] [INFO] [paramtable/component_param.go:1553] ["init segment expansion rate"] [value=1.25]
[2023/06/07 18:22:59.122 +00:00] [INFO] [paramtable/base_table.go:142] ["cannot find etcd.endpoints"]
[2023/06/07 18:22:59.122 +00:00] [INFO] [paramtable/hook_config.go:19] ["hook config"] [hook={}]
[2023/06/07 18:22:59.122 +00:00] [ERROR] [querynode/query_node.go:188] ["load queryhook failed"] [error="fail to set the querynode plugin path"] [stack="github.com/milvus-io/milvus/internal/querynode.NewQueryNode\n\t/go/src/github.com/milvus-io/milvus/internal/querynode/query_node.go:188\ngithub.com/milvus-io/milvus/internal/distributed/querynode.NewServer\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/service.go:83\ngithub.com/milvus-io/milvus/cmd/components.NewQueryNode\n\t/go/src/github.com/milvus-io/milvus/cmd/components/query_node.go:40\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:110"]
[2023/06/07 18:22:59.132 +00:00] [INFO] [config/etcd_source.go:145] ["start refreshing configurations"]
[2023/06/07 18:22:59.133 +00:00] [DEBUG] [paramtable/grpc_param.go:153] [initServerMaxSendSize] [role=querynode] [grpc.serverMaxSendSize=536870912]
[2023/06/07 18:22:59.133 +00:00] [DEBUG] [paramtable/grpc_param.go:175] [initServerMaxRecvSize] [role=querynode] [grpc.serverMaxRecvSize=536870912]
[2023/06/07 18:22:59.133 +00:00] [INFO] [querynode/service.go:106] [QueryNode] [port=21123]
[2023/06/07 18:22:59.134 +00:00] [INFO] [querynode/service.go:122] ["QueryNode connect to etcd successfully"]
[2023/06/07 18:22:59.234 +00:00] [INFO] [querynode/service.go:132] [QueryNode] [State=Initializing]
[2023/06/07 18:22:59.234 +00:00] [INFO] [querynode/query_node.go:299] ["QueryNode session info"] [metaPath=by-dev/meta]
[2023/06/07 18:22:59.234 +00:00] [INFO] [sessionutil/session_util.go:202] ["Session try to connect to etcd"]
[2023/06/07 18:22:59.235 +00:00] [INFO] [sessionutil/session_util.go:217] ["Session connect to etcd success"]
[2023/06/07 18:22:59.243 +00:00] [INFO] [sessionutil/session_util.go:300] ["Session get serverID success"] [key=id] [ServerId=594]
[2023/06/07 18:22:59.253 +00:00] [INFO] [config/etcd_source.go:145] ["start refreshing configurations"]
[2023/06/07 18:22:59.253 +00:00] [INFO] [paramtable/quota_param.go:745] ["init disk quota"] [diskQuota(MB)=+inf]
[2023/06/07 18:22:59.253 +00:00] [INFO] [paramtable/quota_param.go:760] ["init disk quota per DB"] [diskQuotaPerCollection(MB)=1.7976931348623157e+308]
[2023/06/07 18:22:59.253 +00:00] [INFO] [paramtable/component_param.go:1543] ["init segment max idle time"] [value=10m0s]
[2023/06/07 18:22:59.253 +00:00] [INFO] [paramtable/component_param.go:1548] ["init segment min size from idle to sealed"] [value=16]
[2023/06/07 18:22:59.253 +00:00] [INFO] [paramtable/component_param.go:1558] ["init segment max binlog file to sealed"] [value=32]
[2023/06/07 18:22:59.253 +00:00] [INFO] [paramtable/component_param.go:1553] ["init segment expansion rate"] [value=1.25]
[2023/06/07 18:22:59.254 +00:00] [INFO] [paramtable/base_table.go:142] ["cannot find etcd.endpoints"]
[2023/06/07 18:22:59.254 +00:00] [INFO] [paramtable/hook_config.go:19] ["hook config"] [hook={}]
[2023/06/07 18:22:59.255 +00:00] [INFO] [logutil/logutil.go:165] ["Log directory"] [configDir=]
[2023/06/07 18:22:59.255 +00:00] [INFO] [logutil/logutil.go:166] ["Set log file to "] [path=]
[2023/06/07 18:22:59.255 +00:00] [INFO] [querynode/query_node.go:209] ["QueryNode init session"] [nodeID=594] ["node address"=10.236.72.81:21123]
[2023/06/07 18:22:59.255 +00:00] [INFO] [querynode/query_node.go:315] ["QueryNode init rateCollector done"] [nodeID=594]
[2023/06/07 18:22:59.695 +00:00] [INFO] [storage/minio_chunk_manager.go:145] ["minio chunk manager init success."] [bucketname=milvus-dev] [root=milvus]
[2023/06/07 18:22:59.695 +00:00] [INFO] [querynode/query_node.go:325] ["queryNode try to connect etcd success"] [MetaRootPath=by-dev/meta]
[2023/06/07 18:22:59.695 +00:00] [INFO] [querynode/segment_loader.go:945] ["SegmentLoader created"] [ioPoolSize=256] [cpuPoolSize=32]
2023-06-07 18:22:59,696 INFO [default] [KNOWHERE][SetBlasThreshold][milvus] Set faiss::distance_compute_blas_threshold to 16384
2023-06-07 18:22:59,696 INFO [default] [KNOWHERE][SetEarlyStopThreshold][milvus] Set faiss::early_stop_threshold to 0
2023-06-07 18:22:59,696 INFO [default] [KNOWHERE][SetStatisticsLevel][milvus] Set knowhere::STATISTICS_LEVEL to 0
2023-06-07 18:22:59,696 | DEBUG | default | [SERVER][operator()][milvus] Config easylogging with yaml file: /milvus/configs/easylogging.yaml
2023-06-07 18:22:59,697 | DEBUG | default | [SEGCORE][SegcoreSetSimdType][milvus] set config simd_type: auto
2023-06-07 18:22:59,697 | INFO | default | [KNOWHERE][SetSimdType][milvus] FAISS expect simdType::AUTO
2023-06-07 18:22:59,697 | INFO | default | [KNOWHERE][SetSimdType][milvus] FAISS hook AVX2
2023-06-07 18:22:59,697 | DEBUG | default | [SEGCORE][SetIndexSliceSize][milvus] set config index slice size(byte): 16777216
2023-06-07 18:22:59,697 | DEBUG | default | [SEGCORE][SetThreadCoreCoefficient][milvus] set thread pool core coefficient: 10
[2023/06/07 18:22:59.719 +00:00] [WARN] [initcore/init_storage_config.go:94] ["InitRemoteChunkManagerSingleton failed, C Runtime Exception: [UnexpectedError] get authorization failed, errcode:UNAVAILABLE\n"]
[2023/06/07 18:22:59.719 +00:00] [ERROR] [querynode/query_node.go:348] ["QueryNode init segcore failed"] [error="[UnexpectedError] get authorization failed, errcode:UNAVAILABLE"] [stack="github.com/milvus-io/milvus/internal/querynode.(*QueryNode).Init.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querynode/query_node.go:348\nsync.(*Once).doSlow\n\t/usr/local/go/src/sync/once.go:68\nsync.(*Once).Do\n\t/usr/local/go/src/sync/once.go:59\ngithub.com/milvus-io/milvus/internal/querynode.(*QueryNode).Init\n\t/go/src/github.com/milvus-io/milvus/internal/querynode/query_node.go:297\ngithub.com/milvus-io/milvus/internal/distributed/querynode.(*Server).init\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/service.go:133\ngithub.com/milvus-io/milvus/internal/distributed/querynode.(*Server).Run\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/service.go:213\ngithub.com/milvus-io/milvus/cmd/components.(*QueryNode).Run\n\t/go/src/github.com/milvus-io/milvus/cmd/components/query_node.go:54\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:120"]
[2023/06/07 18:22:59.719 +00:00] [ERROR] [querynode/service.go:134] ["QueryNode init error: "] [error="[UnexpectedError] get authorization failed, errcode:UNAVAILABLE"] [stack="github.com/milvus-io/milvus/internal/distributed/querynode.(*Server).init\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/service.go:134\ngithub.com/milvus-io/milvus/internal/distributed/querynode.(*Server).Run\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/service.go:213\ngithub.com/milvus-io/milvus/cmd/components.(*QueryNode).Run\n\t/go/src/github.com/milvus-io/milvus/cmd/components/query_node.go:54\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:120"]
panic: [UnexpectedError] get authorization failed, errcode:UNAVAILABLE
goroutine 194 [running]:
github.com/milvus-io/milvus/cmd/components.(*QueryNode).Run(0x5ba3400?)
/go/src/github.com/milvus-io/milvus/cmd/components/query_node.go:55 +0x56
github.com/milvus-io/milvus/cmd/roles.runComponent[...].func1()
/go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:120 +0x182
created by github.com/milvus-io/milvus/cmd/roles.runComponent[...]
/go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:104 +0x18a
Anything else?
No response
/assign @locustbaby /unassign
Hi @locustbaby, I don't see this error in the log, though it seems to be the one actually causing the issue:
https://github.com/milvus-io/milvus/blob/9ffcd53bd41af26c34d4308b7d48a64d19acc118/internal/core/src/storage/MinioChunkManager.cpp#L125-L132.
To clarify - does Milvus support using GCS as the externalS3
if not running on the GKE or GCE (therefore no IAM)?
@punkerpunker It doesn't support using GCS without IAM.
feel free to contribute if anyone has the requirement
I'm also facing issues getting GCS to work as externalS3. Some components cannot work with IAM enabled while other components cannot work with IAM disabled.
For example, when I set useIAM: true
, dataNode
fails with Access denied
:
[WARN] [storage/minio_chunk_manager.go:203] ["failed to put object"] [path=insert_log/442152882448630211/442152882448630212/442152882448830296/0/442152882448830306] [error="Access denied."]
...
[WARN] [datanode/flush_task.go:230] ["flush task error detected"] [error="All attempts results:\nattempt #1:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #2:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #3:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #4:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #5:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #6:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #7:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #8:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #9:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #10:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\n"] []
[ERROR] [datanode/flush_manager.go:759] ["flush pack with error, DataNode quit now"] [error="execution failed"] [stack="github.com/milvus-io/milvus/internal/datanode.flushNotifyFunc.func1\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/flush_manager.go:759\ngithub.com/milvus-io/milvus/internal/datanode.(*flushTaskRunner).waitFinish\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/flush_task.go:204"]
panic: execution failed
Then when I set useIAM: false
, dataNode
is able to flush the segment. However, the following components fail:
-
queryNode
[ERROR] [querynode/service.go:134] ["QueryNode init error: "] [error="[UnexpectedError] google cloud only support iam mode now"]
-
indexNode
[ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] google cloud only support iam mode now"]
In order to overcome these issues, I had to configure querynode
, indexnode
and indexcoord
with useIAM: true
while setting useIAM: false
globally for other components. I've done this by copying and overriding the milvus config configMap object and attaching it to the affected nodes' deployments. I'm not sure whether we can override configs through extraEnv
for each component instead.
Here are the related helm values configs that I've used:
minio:
enabled: false
externalS3:
enabled: true # Enable or disable external S3 false
host: "storage.googleapis.com" # The host of the external S3 unset
port: 443 # The port of the external S3 unset
accessKey: "***" # The Access Key of the external S3 unset
secretKey: "***" # The Secret Key of the external S3 unset
bucketName: "bucket-name" # The Bucket Name of the external S3 unset
useSSL: true # If true, use SSL to connect to the external S3 false
useIAM: false # If true, use iam to connect to the external S3 false
cloudProvider: "gcp"
dataNode:
extraEnv:
- name: "GOOGLE_APPLICATION_CREDENTIALS"
valueFrom:
secretKeyRef:
name: minio-gcs-secret
key: gcs_key.json
dataCoordinator:
extraEnv:
- name: "GOOGLE_APPLICATION_CREDENTIALS"
valueFrom:
secretKeyRef:
name: minio-gcs-secret
key: gcs_key.json
indexNode:
extraEnv:
- name: "GOOGLE_APPLICATION_CREDENTIALS"
valueFrom:
secretKeyRef:
name: minio-gcs-secret
key: gcs_key.json
indexCoord:
extraEnv:
- name: "GOOGLE_APPLICATION_CREDENTIALS"
valueFrom:
secretKeyRef:
name: minio-gcs-secret
key: gcs_key.json
queryNode:
extraEnv:
- name: "GOOGLE_APPLICATION_CREDENTIALS"
valueFrom:
secretKeyRef:
name: minio-gcs-secret
key: gcs_key.json
Secret to be created using the generated key of the IAM service account:
kubectl create secret generic minio-gcs-secret --from-file=gcs_key.json=minio-gcs-key.json
I'm using these commands to patch deployments:
kubectl patch cm RELEASE_NAME-milvus -p '{"metadata":{ "name":"RELEASE_NAME-milvus-iam"}}' --dry-run=client -o yaml -n NAMESPACE | sed 's/useIAM: false/useIAM: true/g' | kubectl apply -f -
kubectl get deployment RELEASE_NAME-milvus-querynode -o yaml -n NAMESPACE | sed 's/name: RELEASE_NAME-milvus$/name: RELEASE_NAME-milvus-iam/g' > RELEASE_NAME-milvus-querynode-deployment.yaml
kubectl delete deployment RELEASE_NAME-milvus-querynode -n NAMESPACE
kubectl apply -f RELEASE_NAME-milvus-querynode-deployment.yaml -n NAMESPACE
kubectl get deployment RELEASE_NAME-milvus-indexnode -o yaml -n NAMESPACE | sed 's/name: RELEASE_NAME-milvus$/name: RELEASE_NAME-milvus-iam/g' > RELEASE_NAME-milvus-indexnode-deployment.yaml
kubectl delete deployment RELEASE_NAME-milvus-indexnode -n NAMESPACE
kubectl apply -f RELEASE_NAME-milvus-indexnode-deployment.yaml -n NAMESPACE
kubectl get deployment RELEASE_NAME-milvus-indexcoord -o yaml -n NAMESPACE | sed 's/name: RELEASE_NAME-milvus$/name: RELEASE_NAME-milvus-iam/g' > RELEASE_NAME-milvus-indexcoord-deployment.yaml
kubectl delete deployment RELEASE_NAME-milvus-indexcoord -n NAMESPACE
kubectl apply -f RELEASE_NAME-milvus-indexcoord-deployment.yaml -n NAMESPACE
All the above issues are now resolved. However, indexNode
is still failing to upload the index:
[INFO] [indexnode/task.go:346] ["Successfully build index"] [buildID=442219743964239016] [Collection=442219743964038755] [SegmentID=442219743964238988]
terminate called after throwing an instance of 'milvus::storage::S3ErrorException' what(): Error:PutObjectBuffer:AccessDenied Access denied.
SIGABRT: abort
seems that there is a still a authentication issue
Error:PutObjectBuffer:AccessDenied
@zwd1208 any recommendations?
@ahmed-mahran hi ,sorry a bit confused , I guess the second useIAM
should be false
?
-> And may I know the auth way you using now? IAM or ak/sk?
As our engineer says, milvus only support GCS with IAM now, that's the reason querynode and indexnode throw the error google cloud only support iam mode now
@ahmed-mahran hi ,sorry a bit confused , I guess the second useIAM should be false?
You are right. I've edited my comment.
And may I know the auth way you using now? IAM or ak/sk?
I'm using a hybrid mode:
-
useIAM: true
forquerynode
,indexnode
andindexcoord
-
useIAM: false
for the rest
Milvus support GCS with IAM, so you can set useIAM: true
globally.
As you said there was an error Access Deny
when you set useIAM: true
globally, Can you check your IAM configuration?
I'm setting GOOGLE_APPLICATION_CREDENTIALS
environment variable
extraEnv:
- name: "GOOGLE_APPLICATION_CREDENTIALS"
valueFrom:
secretKeyRef:
name: minio-gcs-secret
key: gcs_key.json
The key is generated for a service account with admin privileges
@ahmed-mahran Have you ever restart the cluster? Still stuck? Can you try standalone with GCS IAM?
I've tried standalone with GCS IAM and I'm getting the same errors
[WARN] [storage/minio_chunk_manager.go:203] ["failed to put object"] [path=insert_log/442490109151675611/442490109151675612/442490109151875757/0/442490186934517781] [error="Access denied."]
[WARN] [datanode/flush_task.go:230] ["flush task error detected"] [error="All attempts results:\nattempt #1:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #2:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #3:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #4:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #5:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #6:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #7:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #8:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #9:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\nattempt #10:All attempts results:\nattempt #1:Access denied.\nattempt #2:Access denied.\nattempt #3:Access denied.\nattempt #4:Access denied.\nattempt #5:Access denied.\n\n"] []
[ERROR] [datanode/flush_manager.go:759] ["flush pack with error, DataNode quit now"] [error="execution failed"] [stack="github.com/milvus-io/milvus/internal/datanode.flushNotifyFunc.func1\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/flush_manager.go:759\ngithub.com/milvus-io/milvus/internal/datanode.(*flushTaskRunner).waitFinish\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/flush_task.go:204"]
panic: execution failed
Please note that with the hybrid authentication mode https://github.com/milvus-io/milvus/issues/24727#issuecomment-1596850829 that I've tested on cluster mode, dataNode
was able to write the segment to GCS however indexNode
is failing to put index.
[INFO] [indexnode/task.go:346] ["Successfully build index"] [buildID=442219743964239016] [Collection=442219743964038755] [SegmentID=442219743964238988]
terminate called after throwing an instance of 'milvus::storage::S3ErrorException' what(): Error:PutObjectBuffer:AccessDenied Access denied.
SIGABRT: abort
/assign @haorenfsa could you please help dudes on the gcp access problem
Hi @ahmed-mahran,
The module in milvus to controller the object storage is called chunk manager
, there're 2 type of chunk managers in milvus: The Golang chunk manager in golang code, and the Cpp chunk manager in cpp code.
The Golang chunk manager supports GCS well, whether UseIAM or not, however the Cpp chunk manager for now only supports UseIAM.
In previous versions, only diskANN index uses Cpp chunk manager, so it works well. now we're switching to use Cpp chunk manager only. So that's why things go wrong.
And I also see that you're trying to use the GOOGLE_APPLICATION_CREDENTIALS
. However our cpp code uses AWS SDK, so it's not supported.
Here're a couple of solutions you can choose for now:
- Use a MinIO GCS gateway to proxy all requests, and that supports
GOOGLE_APPLICATION_CREDENTIALS
. The detailed steps were stated in our former docs: https://milvus.io/docs/v2.1.x/gcp.md - Use IAM access for all Milvus components, you'll need to create a GCP service account and assign authority and add annotations to the kubernetes service account. (check GCP's doc for configuration https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity). And finally, make sure Milvus pods uses that service account.
- Use an old version of Milvus which only uses golang chunk manager, v2.2.6 or earlier. ( If you're not going to use diskANN)
We'll fix it soon, and we'll see to the full function available in next release
Thanks for the detailed answer, @haorenfsa
- Use a MinIO GCS gateway to proxy all requests, and that supports GOOGLE_APPLICATION_CREDENTIALS. The detailed steps were stated in our former docs: https://milvus.io/docs/v2.1.x/gcp.md
This was the first thing I've tried. However, MinIO crashes as GCS gateway feature is deprecated and removed https://blog.min.io/deprecation-of-the-minio-gateway/. I guess I would need to find an older compatible version of MinIO.
- Use IAM access for all Milvus components, you'll need to create a GCP service account and assign authority and add annotations to the kubernetes service account. (check GCP's doc for configuration https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity). And finally, make sure Milvus pods uses that service account.
I've also tried the Workload Identity but unfortunately it didn't work. I've verified that my setup is ok following https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#verify_the_setup. I'm not able to provide much details on this as I don't have the logs but from my search history I can tell that I was getting Permission 'iam.serviceAccounts.getAccessToken' denied on resource (or it may not exist)
. Both service accounts for k8s and google services were given full admin privileges.
- Use an old version of Milvus which only uses golang chunk manager, v2.2.6 or earlier. ( If you're not going to use diskANN)
Not sure whether multi tenancy and RBAC through databases was supported then.
We'll fix it soon, and we'll see to the full function available in next release
That's good news! I think I'll wait until next release, use a nightly version or apply the patch and build my own version.
@ahmed-mahran Thank you for the patience 😂. About solution 2, it should be working, our service on gcp also adopts this method. You can check your configuration correctness by kubectl exec <pod> -- bash
into the milvus pod and execute following commands:
# acquire identity token from gcp meta server
curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" -H "Metadata-Flavor: Google"
Find the token
field in output, then copy it and execute:
export token=<token>
export bucket=<my-bucket>
# check if you can now list objects in a bucket
curl "https://storage.googleapis.com/$bucket?list-type=2&prefix=" -H "Authorization: Bearer $token"
If all your configuration is correct, commands above should be working. If not, you can diagnose the problem with the output hints.
From my experience, it's likely one of the below steps goes wrong. I can help you check if you'd like to provide your
NAMESPACE
& KSA_NAME
And I've tested the fix patch in gcp, it would be merge soon.