viya4-monitoring-kubernetes
viya4-monitoring-kubernetes copied to clipboard
opensearch play fails Exiting script [deploy_logging.sh]
I have deployed Sas Viya 4 (2023.05) on AWS using the aws-iac repo.
-
branch 6.0.0 - https://github.com/sassoftware/viya4-iac-aws
-
branch 6.7.0 - https://github.com/sassoftware/viya4-deployment
When deploying "components=cluster-logging" it fails. The other steps work fine, and it even givs me the Opensearch user and password.
im running the playbook for "components=cluster-logging" i.e. make sas-deployment prefix=viya4-01 namespace=sas-viya4 components=cluster-logging
TASK [monitoring : cluster-logging - deploy]
task path: /viya4-deployment/roles/monitoring/tasks/cluster-logging.yaml:94
fatal: [localhost]: FAILED! => changed=true
cmd:
- /tmp/ansible.1xg7lu0y/viya4-monitoring-kubernetes/logging/bin/deploy_logging.sh
delta: '0:00:24.326475'
end: '2023-07-20 11:57:52.839824'
invocation:
module_args:
_raw_params: /tmp/ansible.1xg7lu0y/viya4-monitoring-kubernetes/logging/bin/deploy_logging.sh
_uses_shell: false
argv: null
chdir: null
creates: null
executable: null
removes: null
stdin: null
stdin_add_newline: true
strip_empty_ends: true
warn: true
msg: non-zero return code
rc: 1
start: '2023-07-20 11:57:28.513349'
stderr: |-
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
**warning: error calculating patch from openapi spec: map: map[] does not contain declared merge key: name
error: error when applying patch:**
to:
Resource: "apps/v1, Resource=deployments", GroupVersionKind: "apps/v1, Kind=Deployment"
Name: "eventrouter", Namespace: "logging"
for: "/tmp/sas.mon.lwen61fu/logging/eventrouter.yaml": error when patching "/tmp/sas.mon.lwen61fu/logging/eventrouter.yaml": creating patch with:
original:
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app":"eventrouter","kubernetes.io/name":"eventrouter","v4m.sas.com/name":"viya4-monitoring-kubernetes"},"name":"eventrouter","namespace":"logging"},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"eventrouter"}},"template":{"metadata":{"labels":{"app":"eventrouter","kubernetes.io/name":"eventrouter","tier":"control-plane-addons","v4m.sas.com/name":"viya4-monitoring-kubernetes"}},"spec":{"automountServiceAccountToken":true,"containers":[{"image":"gcr.io/heptio-images/eventrouter:v0.3","imagePullPolicy":"IfNotPresent","name":"kube-eventrouter","volumeMounts":[{"mountPath":"/etc/eventrouter","name":"config-volume"}]}],"imagePullSecrets":[{"name":null}],"serviceAccount":"eventrouter","volumes":[{"configMap":{"name":"eventrouter-cm"},"name":"config-volume"}]}}}}
modified:
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"apps/v1\",\"kind\":\"Deployment\",\"metadata\":{\"annotations\":{},\"labels\":{\"app\":\"eventrouter\",\"kubernetes.io/name\":\"eventrouter\",\"v4m.sas.com/name\":\"viya4-monitoring-kubernetes\"},\"name\":\"eventrouter\",\"namespace\":\"logging\"},\"spec\":{\"replicas\":1,\"selector\":{\"matchLabels\":{\"app\":\"eventrouter\"}},\"template\":{\"metadata\":{\"labels\":{\"app\":\"eventrouter\",\"kubernetes.io/name\":\"eventrouter\",\"tier\":\"control-plane-addons\",\"v4m.sas.com/name\":\"viya4-monitoring-kubernetes\"}},\"spec\":{\"automountServiceAccountToken\":true,\"containers\":[{\"image\":\"gcr.io/heptio-images/eventrouter:v0.3\",\"imagePullPolicy\":\"IfNotPresent\",\"name\":\"kube-eventrouter\",\"volumeMounts\":[{\"mountPath\":\"/etc/eventrouter\",\"name\":\"config-volume\"}]}],\"imagePullSecrets\":[{\"name\":null}],\"serviceAccount\":\"eventrouter\",\"volumes\":[{\"configMap\":{\"name\":\"eventrouter-cm\"},\"name\":\"config-volume\"}]}}}}\n"},"labels":{"app":"eventrouter","kubernetes.io/name":"eventrouter","v4m.sas.com/name":"viya4-monitoring-kubernetes"},"name":"eventrouter","namespace":"logging"},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"eventrouter"}},"template":{"metadata":{"labels":{"app":"eventrouter","kubernetes.io/name":"eventrouter","tier":"control-plane-addons","v4m.sas.com/name":"viya4-monitoring-kubernetes"}},"spec":{"automountServiceAccountToken":true,"containers":[{"image":"gcr.io/heptio-images/eventrouter:v0.3","imagePullPolicy":"IfNotPresent","name":"kube-eventrouter","volumeMounts":[{"mountPath":"/etc/eventrouter","name":"config-volume"}]}],"imagePullSecrets":[{"name":null}],"serviceAccount":"eventrouter","volumes":[{"configMap":{"name":"eventrouter-cm"},"name":"config-volume"}]}}}}
current:
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{"deployment.kubernetes.io/revision":"1","kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"apps/v1\",\"kind\":\"Deployment\",\"metadata\":{\"annotations\":{},\"labels\":{\"app\":\"eventrouter\",\"kubernetes.io/name\":\"eventrouter\",\"v4m.sas.com/name\":\"viya4-monitoring-kubernetes\"},\"name\":\"eventrouter\",\"namespace\":\"logging\"},\"spec\":{\"replicas\":1,\"selector\":{\"matchLabels\":{\"app\":\"eventrouter\"}},\"template\":{\"metadata\":{\"labels\":{\"app\":\"eventrouter\",\"kubernetes.io/name\":\"eventrouter\",\"tier\":\"control-plane-addons\",\"v4m.sas.com/name\":\"viya4-monitoring-kubernetes\"}},\"spec\":{\"automountServiceAccountToken\":true,\"containers\":[{\"image\":\"gcr.io/heptio-images/eventrouter:v0.3\",\"imagePullPolicy\":\"IfNotPresent\",\"name\":\"kube-eventrouter\",\"volumeMounts\":[{\"mountPath\":\"/etc/eventrouter\",\"name\":\"config-volume\"}]}],\"imagePullSecrets\":[{\"name\":null}],\"serviceAccount\":\"eventrouter\",\"volumes\":[{\"configMap\":{\"name\":\"eventrouter-cm\"},\"name\":\"config-volume\"}]}}}}\n"},"creationTimestamp":"2023-07-19T16:05:09Z","generation":1,"labels":{"app":"eventrouter","kubernetes.io/name":"eventrouter","v4m.sas.com/name":"viya4-monitoring-kubernetes"},"name":"eventrouter","namespace":"logging","resourceVersion":"272586","uid":"9ea74edd-4637-4cdf-9bf3-968429ded57b"},"spec":{"progressDeadlineSeconds":600,"replicas":1,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app":"eventrouter"}},"strategy":{"rollingUpdate":{"maxSurge":"25%","maxUnavailable":"25%"},"type":"RollingUpdate"},"template":{"metadata":{"creationTimestamp":null,"labels":{"app":"eventrouter","kubernetes.io/name":"eventrouter","tier":"control-plane-addons","v4m.sas.com/name":"viya4-monitoring-kubernetes"}},"spec":{"automountServiceAccountToken":true,"containers":[{"image":"gcr.io/heptio-images/eventrouter:v0.3","imagePullPolicy":"IfNotPresent","name":"kube-eventrouter","resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","volumeMounts":[{"mountPath":"/etc/eventrouter","name":"config-volume"}]}],"dnsPolicy":"ClusterFirst","imagePullSecrets":[{}],"restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"serviceAccount":"eventrouter","serviceAccountName":"eventrouter","terminationGracePeriodSeconds":30,"volumes":[{"configMap":{"defaultMode":420,"name":"eventrouter-cm"},"name":"config-volume"}]}}},"status":{"availableReplicas":1,"conditions":[{"lastTransitionTime":"2023-07-19T16:05:09Z","lastUpdateTime":"2023-07-19T16:05:13Z","message":"ReplicaSet \"eventrouter-55df4c4644\" has successfully progressed.","reason":"NewReplicaSetAvailable","status":"True","type":"Progressing"},{"lastTransitionTime":"2023-07-20T08:01:24Z","lastUpdateTime":"2023-07-20T08:01:24Z","message":"Deployment has minimum availability.","reason":"MinimumReplicasAvailable","status":"True","type":"Available"}],"observedGeneration":1,"readyReplicas":1,"replicas":1,"updatedReplicas":1}}
**for:: map: map[] does not contain declared merge key: name**
stderr_lines:
stdout: |-
INFO User directory: /tmp/ansible.1xg7lu0y
INFO Helm client version: 3.11.3
INFO Kubernetes client version: v1.25.9
INFO Kubernetes server version: v1.25.11-eks-a5565ad**
Deploying logging components to the [logging] namespace [Thu Jul 20 11:57:48 UTC 2023]
INFO Deploying Event Router ...
serviceaccount/eventrouter unchanged
clusterrole.rbac.authorization.k8s.io/eventrouter unchanged
clusterrolebinding.rbac.authorization.k8s.io/eventrouter unchanged
configmap/eventrouter-cm unchanged
**ERROR Exiting script [deploy_logging.sh] due to an error executing the command [logging/bin/deploy_eventrouter.sh].**
stdout_lines:
@cumcke any idea what the issue could be ?
@biohazd Were you deploying into a entirely new cluster or were you updating SAS Viya Monitoring on an existing cluster? The messages you shared look like they were attempting to update an existing deployment of things.
It is a new cluster. I ran the script a few times.
I have tried it on a new cluster as well, and still fails.
Is there any way to debug it better or get some more verbose error message ?
Unfortunately, when deploying our project via the Viya 4 Deployment project, we lose some visibility and/access. If the cluster is up and running, you could deploy the monitoring components directly by using our deployment tooling directly. All that you would need is a Linux shell with access to the cluster and a kube config file with full admin access to the cluster. That would probably make it easier to debug this problem.
In the meantime, I have some additional questions:
- Are you using a custom USER_DIR directory (via the new V4M_CUSTOM_CONFIG_USER_DIR configuration variable)?
- Does this cluster have limited internet access or any sort of networking limitations?
I would recommend you remove any of the logging components already deployed before trying again. To do that, run the uninstall task that corresponds to the cluster_logging deploy task you've been running. Or, alternatively, you should be able to delete the "logging" namespace. That will give you a clean environment. Once you've done that, re-run the deployment task and, if it fails, share the log output you get at that point. That might give us a better idea of what's going on.
Thanks. I will try that.
I’m not using any custom USERDIR and there should be no network restrictions.
Thnaks, it did deploy and is working, but it does still show those errors.
TASK [monitoring : cluster-logging - deploy] *********************************** task path: /viya4-deployment/roles/monitoring/tasks/cluster-logging.yaml:94 fatal: [localhost]: FAILED! => changed=true cmd:
-
/tmp/ansible.rewnutat/viya4-monitoring-kubernetes/logging/bin/deploy_logging.sh delta: '0:09:23.592197' end: '2023-07-21 09:16:22.959295' invocation: module_args: _raw_params: /tmp/ansible.rewnutat/viya4-monitoring-kubernetes/logging/bin/deploy_logging.sh _uses_shell: false argv: null chdir: null creates: null executable: null removes: null stdin: null stdin_add_newline: true strip_empty_ends: true warn: true msg: non-zero return code rc: 1 start: '2023-07-21 09:06:59.367098' stderr: |- Flag --short has been deprecated, and will be removed in the future. The --short output will become the default. Flag --short has been deprecated, and will be removed in the future. The --short output will become the default. Warning: spec.template.spec.imagePullSecrets[0].name: invalid empty name "" environment: line 2: [: ==: unary operator expected environment: line 2: [: ==: unary operator expected environment: line 2: [: ==: unary operator expected stderr_lines:
stdout: |- INFO User directory: /tmp/ansible.rewnutat INFO Helm client version: 3.11.3 INFO Kubernetes client version: v1.25.9 INFO Kubernetes server version: v1.25.11-eks-a5565ad namespace/logging created serviceaccount/default patched Deploying logging components to the [logging] namespace [Fri Jul 21 09:07:28 UTC 2023]
INFO Deploying Event Router ... serviceaccount/eventrouter created clusterrole.rbac.authorization.k8s.io/eventrouter created clusterrolebinding.rbac.authorization.k8s.io/eventrouter created configmap/eventrouter-cm created deployment.apps/eventrouter created INFO Event Router has been deployed
secret/internal-user-kibanaserver labeled secret/v4m-root-ca-tls-secret created secret/v4m-root-ca-tls-secret annotated secret/v4m-root-ca-tls-secret labeled secret/kibana-tls-secret created secret/kibana-tls-secret annotated secret/kibana-tls-secret labeled INFO Adding [opensearch] helm repository "opensearch" has been added to your repositories secret/v4m-osd-tls-enabled created INFO Deploying OpenSearch Dashboards Release "v4m-osd" does not exist. Installing it now. NAME: v4m-osd LAST DEPLOYED: Fri Jul 21 09:08:42 2023 NAMESPACE: logging STATUS: deployed REVISION: 1 TEST SUITE: None NOTES:
- Get the application URL by running these commands: https://dashboards.xxxx.com/ INFO OpenSearch Dashboards has been deployed serviceaccount/v4m-osd-dashboards patched
secret/internal-user-admin labeled secret/internal-user-logcollector labeled secret/internal-user-metricgetter labeled secret/es-transport-tls-secret created secret/es-transport-tls-secret annotated secret/es-transport-tls-secret labeled secret/es-rest-tls-secret created secret/es-rest-tls-secret annotated secret/es-rest-tls-secret labeled secret/es-admin-tls-secret created secret/es-admin-tls-secret annotated secret/es-admin-tls-secret labeled secret/opensearch-cert-subjects created secret/opensearch-cert-subjects labeled configmap/run-securityadmin.sh created configmap/run-securityadmin.sh labeled "opensearch" already exists with the same configuration, skipping secret/opensearch-securityconfig created secret/opensearch-securityconfig labeled INFO Deploying OpenSearch Release "opensearch" does not exist. Installing it now. NAME: opensearch LAST DEPLOYED: Fri Jul 21 09:11:39 2023 NAMESPACE: logging STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: Watch all cluster members come up. $ kubectl get pods --namespace=logging -l app.kubernetes.io/component=v4m-search -w INFO Waiting on OpenSearch pods to be Ready pod/v4m-search-0 condition met Looping: 0 Fri Jul 21 09:14:57 UTC 2023 RC: 0 | run_securityadmin.sh script starting [Fri Jul 21 09:14:57 UTC 2023] | ************************************************************************** | ** This tool will be deprecated in the next major release of OpenSearch ** | ** https://github.com/opensearch-project/security/issues/1755 ** | ************************************************************************** | Security Admin v7 | Will connect to localhost:9200 ... done | Connected as "CN=es-admin,O=v4m" | OpenSearch Version: 2.8.0 | Contacting opensearch cluster 'opensearch' and wait for YELLOW clusterstate ... | Clustername: opensearch-cluster | Clusterstate: GREEN | Number of nodes: 3 | Number of data nodes: 3 | .opendistro_security index does not exists, attempt to create it ... done (0-all replicas) | Populate config from /usr/share/opensearch/plugins/opensearch-security/securityconfig/ | Will update '/config' with /usr/share/opensearch/plugins/opensearch-security/securityconfig/config.yml | SUCC: Configuration for 'config' created or updated | Will update '/roles' with /usr/share/opensearch/plugins/opensearch-security/securityconfig/roles.yml | SUCC: Configuration for 'roles' created or updated | Will update '/rolesmapping' with /usr/share/opensearch/plugins/opensearch-security/securityconfig/roles_mapping.yml | SUCC: Configuration for 'rolesmapping' created or updated | Will update '/internalusers' with /usr/share/opensearch/plugins/opensearch-security/securityconfig/internal_users.yml | SUCC: Configuration for 'internalusers' created or updated | Will update '/actiongroups' with /usr/share/opensearch/plugins/opensearch-security/securityconfig/action_groups.yml | SUCC: Configuration for 'actiongroups' created or updated | Will update '/tenants' with /usr/share/opensearch/plugins/opensearch-security/securityconfig/tenants.yml | SUCC: Configuration for 'tenants' created or updated | Will update '/nodesdn' with /usr/share/opensearch/plugins/opensearch-security/securityconfig/nodes_dn.yml | SUCC: Configuration for 'nodesdn' created or updated | Will update '/whitelist' with /usr/share/opensearch/plugins/opensearch-security/securityconfig/whitelist.yml | SUCC: Configuration for 'whitelist' created or updated | Will update '/allowlist' with /usr/share/opensearch/plugins/opensearch-security/securityconfig/allowlist.yml | SUCC: Configuration for 'allowlist' created or updated | SUCC: Expected 10 config types for node {"updated_config_types":["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","internalusers","actiongroups","config"],"updated_config_size":10,"message":null} is 10 (["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","internalusers","actiongroups","config"]) due to: null | SUCC: Expected 10 config types for node {"updated_config_types":["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","internalusers","actiongroups","config"],"updated_config_size":10,"message":null} is 10 (["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","internalusers","actiongroups","config"]) due to: null | SUCC: Expected 10 config types for node {"updated_config_types":["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","internalusers","actiongroups","config"],"updated_config_size":10,"message":null} is 10 (["allowlist","tenants","rolesmapping","nodesdn","audit","roles","whitelist","internalusers","actiongroups","config"]) due to: null | Done with success INFO OpenSearch has been deployed
INFO Deploying Elasticsearch metric exporter ... INFO Adding [prometheus-community] helm repository "prometheus-community" has been added to your repositories Release "es-exporter" does not exist. Installing it now. NAME: es-exporter LAST DEPLOYED: Fri Jul 21 09:15:43 2023 NAMESPACE: logging STATUS: deployed REVISION: 1 TEST SUITE: None NOTES:
- Get the application URL by running these commands: export POD_NAME=$(kubectl get pods --namespace logging -l "app=v4m-es-exporter" -o jsonpath="{.items[0].metadata.name}") echo "Visit http://127.0.0.1:9108/metrics to use your application" kubectl port-forward $POD_NAME 9108:9108 --namespace logging INFO Elasticsearch metric exporter has been deployed
INFO Loading Content into OpenSearch ERROR Unable to identify the temporary port used for port-forwarding [v4m-search]; exiting script. ERROR Exiting script [deploy_logging.sh] due to an error executing the command [logging/bin/deploy_opensearch_content.sh]. stdout_lines:
@biohazd Unfortunately, the messages indicate things did not actually deploy completely successfully in your environment. While some pods may be up and running, the ERROR messages indicate that some of the content (e.g. the pre-built OpenSearch Dashboards, saved queries, etc.) couldn't be loaded. In addition, I suspect the Fluent Bit pods which collect the log messages from the various Kubernetes nodes/pods were not deployed either. So, you will have no log messages to review in OpenSearch Dashboards.
The message about "not being able to identify the temporary port used for port-forwarding" is a revealing one. I suspect OpenSearch was not up (and may still not be up) when the script ran.
I suspect you are running into resource issues on your cluster. Is it possibly under-sized? We've seen these sorts of messages when that is the case.
If you can monitor the cluster during the deployment process, check the OpenSearch pod logs for error messages and/or events indicating there was a problem scheduling the pod onto a node. Another possibility is that the PVCs needed by OpenSearch weren't provisioned for some reason. All of that should be detectable by monitoring the cluster during the deployment process using a tool like OpenLens or even just using kubectl describe pod commands.
@biohazd I wanted to check in with on the status of this issue. Were you able to get everything deployed and working?
It does seem to deploy fine, but those errors are still there.
I have the same problem. I have verified i have sufficient resources in my cluster.
Release "es-exporter" does not exist. Installing it now. NAME: es-exporter LAST DEPLOYED: Fri Jun 7 17:00:34 2024 NAMESPACE: logging STATUS: deployed REVISION: 1 TEST SUITE: None NOTES:
- Get the application URL by running these commands: export POD_NAME=$(kubectl get pods --namespace logging -l "app=v4m-es-exporter" -o jsonpath="{.items[0].metadata.name}") echo "Visit http://127.0.0.1:9108/metrics to use your application" kubectl port-forward $POD_NAME 9108:9108 --namespace logging INFO Elasticsearch metric exporter has been deployed
INFO Loading Content into OpenSearch ERROR Unable to identify the temporary port used for port-forwarding [v4m-search]; exiting script. ERROR Exiting script [deploy_logging.sh] due to an error executing the command [logging/bin/deploy_opensearch_content.sh].
I see following resources in my name space
NAME READY STATUS RESTARTS AGE pod/v4m-es-exporter-6854fd79f7-gl72g 1/1 Running 0 4m4s pod/v4m-osd-78f8ddfc57-j98k5 1/1 Running 0 13m pod/v4m-search-0 1/1 Running 0 8m24s pod/v4m-search-1 1/1 Running 0 8m24s pod/v4m-search-2 1/1 Running 0 8m24s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/v4m-es-exporter ClusterIP 10.0.3.144
NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/v4m-es-exporter 1/1 1 1 4m5s deployment.apps/v4m-osd 1/1 1 1 13m
NAME DESIRED CURRENT READY AGE replicaset.apps/v4m-es-exporter-6854fd79f7 1 1 1 4m5s replicaset.apps/v4m-osd-78f8ddfc57 1 1 1 13m
NAME READY AGE statefulset.apps/v4m-search 3/3 8m27s