sample-apps
sample-apps copied to clipboard
When deploying vespa ha on self-built kubernetes, it reports that the host name cannot be resolved.
Refer to the following link to deploy vespa on a three-node self-built kubernetes cluster. https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA/gke
Among them, service-query.yml and service-feed.yml are two configuration files. Since they are deployed locally, the type is changed from the default LoadBalancer to ClusterIP.
Please help me see how to solve it.
After the node is deployed, it looks like this: vespa-admin-0 1/1 Running 0 18m vespa-configserver-0 1/1 Running 0 45m vespa-configserver-1 1/1 Running 0 45m vespa-configserver-2 1/1 Running 0 45m vespa-content-0 1/1 Running 0 18m vespa-content-1 1/1 Running 0 18m vespa-feed-container-0 1/1 Running 0 18m vespa-feed-container-1 1/1 Running 0 18m vespa-query-container-0 1/1 Running 0 18m vespa-query-container-1 1/1 Running 0 18m
After deployment, the pod keeps reporting the following error: vespa-configserver-0:
[2024-03-12 09:24:25.799] WARNING logd logdemon.vespalib.net.async_resolver could not resolve host name: 'vespa-admin-0.vespa-internal.default.svc.cluster.local' [2024-03-12 09:25:30.805] WARNING logd logdemon.vespalib.net.async_resolver could not resolve host name: 'vespa-admin-0.vespa-internal.default.svc.cluster.local' [2024-03-12 09:26:33.940] WARNING config-sentinel sentinel.vespalib.net.async_resolver could not resolve host name: 'vespa-content-0.vespa-internal.default.svc.cluster.local' [2024-03-12 09:26:33.943] WARNING config-sentinel sentinel.vespalib.net.async_resolver could not resolve host name: 'vespa-content-1.vespa-internal.default.svc.cluster.local' [2024-03-12 09:26:33.945] WARNING config-sentinel sentinel.vespalib.net.async_resolver could not resolve host name: 'vespa-feed-container-0.vespa-internal.default.svc.cluster.local' [2024-03-12 09:26:33.947] WARNING config-sentinel sentinel.vespalib.net.async_resolver could not resolve host name: 'vespa-feed-container-1.vespa-internal.default.svc.cluster.local' [2024-03-12 09:26:33.949] WARNING config-sentinel sentinel.vespalib.net.async_resolver could not resolve host name: 'vespa-query-container-0.vespa-internal.default.svc.cluster.local' [2024-03-12 09:26:33.951] WARNING config-sentinel sentinel.vespalib.net.async_resolver could not resolve host name: 'vespa-query-container-1.vespa-internal.default.svc.cluster.local' [2024-03-12 09:27:45.620] WARNING slobrok vespa-slobrok.slobrok.server.exchange_manager Peer slobrok at tcp/vespa-configserver-1.vespa-internal.default.svc.cluster.local:19099 may have problems, differences from consensus map: \nmissing: music/search/cluster.music/1/realtimecontroller->tcp/vespa-content-1.vespa-internal.default.svc.cluster.local:19103\nmissing: storage/cluster.music/distributor/1->tcp/vespa-content-1.vespa-internal.default.svc.cluster.local:19110\nmissing: storage/cluster.music/distributor/1/default->tcp/vespa-content-1.vespa-internal.default.svc.cluster.local:19109\nmissing: storage/cluster.music/storage/0->tcp/vespa-content-0.vespa-internal.default.svc.cluster.local:19101\nmissing: storage/cluster.music/storage/0/default->tcp/vespa-content-0.vespa-internal.default.svc.cluster.local:19100\nmissing: storage/cluster.music/storage/1->tcp/vespa-content-1.vespa-internal.default.svc.cluster.local:19101\nmissing: storage/cluster.music/storage/1/default->tcp/vespa-content-1.vespa-internal.default.svc.cluster.local:19100
vespa-content-1: [2024-03-12 09:27:13.062] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:13.062] WARNING config-sentinel sentinel.sentinel.env Bad network connectivity (try 11) [2024-03-12 09:27:16.064] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:16.064] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:19.065] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:19.065] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:22.067] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:22.067] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:25.068] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:25.068] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:28.069] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:28.070] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:31.071] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:31.071] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:34.081] WARNING config-sentinel sentinel.sentinel.connectivity 5 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:37.089] WARNING config-sentinel sentinel.sentinel.connectivity 5 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:40.097] WARNING config-sentinel sentinel.sentinel.connectivity 5 of 10 nodes up but with network connectivity problems (max is 1)
vespa-feed-container-1: [2024-03-12 09:27:06.592] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:09.594] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:09.594] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:09.594] WARNING config-sentinel sentinel.sentinel.env Bad network connectivity (try 11) [2024-03-12 09:27:12.595] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:12.595] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:15.597] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:15.597] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:18.598] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:18.598] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:21.599] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:21.599] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:24.601] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:24.601] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:27.602] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:27.602] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:30.604] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:30.604] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:33.606] WARNING config-sentinel sentinel.sentinel.connectivity 4 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:33.606] WARNING config-sentinel sentinel.sentinel.connectivity Only 1 of 10 nodes are up and OK, 10.0% (min is 50%) [2024-03-12 09:27:36.614] WARNING config-sentinel sentinel.sentinel.connectivity 5 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:39.617] WARNING config-sentinel sentinel.sentinel.connectivity 2 of 10 nodes up but with network connectivity problems (max is 1) [2024-03-12 09:27:39.617] WARNING config-sentinel sentinel.sentinel.env Bad network connectivity (try 21)
Hi, does the sample app work if you run it with no changes?
Hi, does the sample app work if you run it with no changes?
helllo kkraune I think the current problem is caused by the inability to resolve the dns in the container; I only modified the service, which did not have much impact on the container itself.
hi kkraune How to deploy vespa in self-built kubernetes. Is there any better reference document or helm chart available? Instead of gke’s document link.
Hi, my question was, can you make the guide work without changing anything in the source files? It is hard to help otherwise.
You can also try https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/basic-search-on-gke
I am sorry we do not have better examples and documentation on this, submissions welcome :-)
Vespa requires the network incl DNS to work before nodes are started so the usual issue is that people try to both start the containers and Vespa at the same time.
Lots of people are running Vespa on k8s at large scale so its resolvable, but we can't really provide detailed help with it.
In general you'll be better off using Vespa Cloud as the control plane rather than a generic control plane like k8s - you can still deploy it in your own AWS account/GCP project.
Hi, my question was, can you make the guide work without changing anything in the source files? It is hard to help otherwise.
You can also try https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/basic-search-on-gke
I am sorry we do not have better examples and documentation on this, submissions welcome :-)
hello kkraune
Without modifying any configuration, after installing vespa, the deployment application reports the following warning log
[root@obtest133 gke]# zip -r - . -x README.md "config/*" | curl --header Content-Type:application/zip --data-binary @- http://localhost:19071/application/v2/tenant/default/prepareandactivate adding: hosts.xml (deflated 83%) adding: schemas/ (stored 0%) adding: schemas/music.sd (deflated 57%) adding: services.xml (deflated 71%) { "log": [{ "time": 1710465277829, "level": "WARNING", "message": "Host named 'vespa-feed-container-0.vespa-internal.default.svc.cluster.local' may not receive any config since it differs from its canonical hostname '100-65-143-29.vespa-feed.default.svc.cluster.local' (check DNS and /etc/hosts).", "applicationPackage": true }, { "time": 1710465277831, "level": "WARNING", "message": "Host named 'vespa-feed-container-1.vespa-internal.default.svc.cluster.local' may not receive any config since it differs from its canonical hostname '100-94-197-63.vespa-feed.default.svc.cluster.local' (check DNS and /etc/hosts).", "applicationPackage": true }, { "time": 1710465277832, "level": "WARNING", "message": "Host named 'vespa-query-container-0.vespa-internal.default.svc.cluster.local' may not receive any config since it differs from its canonical hostname '100-65-143-15.vespa-query.default.svc.cluster.local' (check DNS and /etc/hosts).", "applicationPackage": true }, { "time": 1710465277833, "level": "WARNING", "message": "Host named 'vespa-query-container-1.vespa-internal.default.svc.cluster.local' may not receive any config since it differs from its canonical hostname '100-94-197-20.vespa-query.default.svc.cluster.local' (check DNS and /etc/hosts).", "applicationPackage": true }], "message": "Session 16 for tenant 'default' prepared and activated.", "session-id": "16", "activated": true, "tenant": "default", "url": "http://localhost:19071/application/v2/tenant/default/application/default/environment/prod/region/default/instance/default", "configChangeActions": { "restart": [], "refeed": [], "reindex": [] } }
I just ran all the steps in https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA/gke, and it worked fine for me
Make sure all config servers are started - this takes 2-3 minutes:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
vespa-configserver-0 1/1 Running 0 12m
vespa-configserver-1 1/1 Running 0 11m
vespa-configserver-2 1/1 Running 0 10m
Then
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
vespa-admin-0 1/1 Running 0 2m29s
vespa-configserver-0 1/1 Running 0 15m
vespa-configserver-1 1/1 Running 0 14m
vespa-configserver-2 1/1 Running 0 13m
vespa-content-0 1/1 Running 0 2m28s
vespa-content-1 1/1 Running 0 2m12s
vespa-feed-container-0 1/1 Running 0 2m28s
vespa-feed-container-1 1/1 Running 0 2m27s
vespa-query-container-0 1/1 Running 0 2m28s
vespa-query-container-1 1/1 Running 0 2m26s
and then go through all the steps in the readme to run the healthchecks - it takes some time to start the Vespa services inside the pods
$ curl http://localhost:19107/state/v1/health
{"status":{"code":"down","message":"matchengine: Search interface is offline"}}
$ curl http://localhost:19107/state/v1/health
{"status":{"code":"down","message":"matchengine: Search interface is offline"}}
$ curl http://localhost:19107/state/v1/health
{"status":{"code":"up"}}
So this works at least in the Google Cloud environment. Please see https://docs.vespa.ai/en/operations-selfhosted/node-setup.html#hostname and linked articles from there for the hostname problems you might have. Hope this helps
hi kkraune , In my cluster, the following detections are normal. Can this prove that the HA deployment is ready now? 1.pod component [root@obtest133 ~]# kubectl get pod |grep -i vespa vespa-admin-0 1/1 Running 0 3d1h vespa-configserver-0 1/1 Running 0 3d2h vespa-configserver-1 1/1 Running 0 3d2h vespa-configserver-2 1/1 Running 0 3d2h vespa-content-0 1/1 Running 0 3d1h vespa-content-1 1/1 Running 0 3d1h vespa-feed-container-0 1/1 Running 0 3d1h vespa-feed-container-1 1/1 Running 0 3d1h vespa-query-container-0 1/1 Running 0 3d1h vespa-query-container-1 1/1 Running 0 3d1h
- Cluster status check kubectl port-forward pod/vespa-configserver-0 19071 [root@obtest133 ~]# curl http://localhost:19071/state/v1/health { "time" : 1710731499447, "status" : { "code" : "up" }, "metrics" : { "snapshot" : { "from" : 1.710731438526E9, "to" : 1.710731498526E9 }, "values" : [ { "name" : "requestsPerSecond", "values" : { "count" : 0, "rate" : 0.0 } }, { "name" : "latencySeconds", "values" : { "average" : 0.006, "sum" : 0.0, "count" : 0, "last" : 0.006, "max" : 0.006, "min" : 0.006, "rate" : 0.0 } } ] }
Vespa requires the network incl DNS to work before nodes are started so the usual issue is that people try to both start the containers and Vespa at the same time.
Lots of people are running Vespa on k8s at large scale so its resolvable, but we can't really provide detailed help with it.
In general you'll be better off using Vespa Cloud as the control plane rather than a generic control plane like k8s - you can still deploy it in your own AWS account/GCP project.
hello bratseth,Due to company security restrictions, hosted public cloud services cannot be used and self-built clusters are required.
Due to company security restrictions, hosted public cloud services cannot be used and self-built clusters are required.
Not when the data plane remains in an account owned by your company though?