helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[Bug]:"Failed to init storage factory","error":"failed to create primary Elasticsearch client: health check timeout: no Elasticsearch node available"

Open navin-rai opened this issue 2 years ago • 6 comments

What happened?

I am using AWS Elasticsearch and trying to use it in jaeger, I have set the endpoints as per documentation, I am using latest version of jaeger helm chart apiVersion: v2, appVersion: 1.39.0, below is the config I am using for elasticsearch elasticsearch: scheme: https host: search-*********************.us-east-1.es.amazonaws.com port: 443 user: elastic usePassword: true password: *********

Steps to reproduce

Add AWS ES endpoints helm install jaeger

Expected behavior

Jaeger Collector and Jaeger Query should deploy properly on Kubernetes.

Relevant log output

2023/02/01 13:18:20 maxprocs: Leaving GOMAXPROCS=24: CPU quota undefined
{"level":"info","ts":1675257500.4474466,"caller":"flags/service.go:119","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1675257500.447506,"caller":"flags/service.go:125","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
{"level":"info","ts":1675257500.4477003,"caller":"flags/admin.go:129","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1675257500.4477556,"caller":"flags/admin.go:143","msg":"Starting admin HTTP server","http-addr":":14269"}
{"level":"info","ts":1675257500.447806,"caller":"flags/admin.go:121","msg":"Admin server started","http.host-port":"[::]:14269","health-status":"unavailable"}
{"level":"fatal","ts":1675257506.1148498,"caller":"./main.go:82","msg":"Failed to init storage factory","error":"failed to create primary Elasticsearch client: health check timeout: no Elasticsearch node available","stacktrace":"main.main.func1\n\t./main.go:82\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:968\nmain.main\n\t./main.go:155\nruntime.main\n\truntime/proc.go:250"}

Screenshot

No response

Additional context

No response

Jaeger backend version

No response

SDK

No response

Pipeline

No response

Stogage backend

AWS Elasticsearch

Operating system

No response

Deployment model

No response

Deployment configs

No response

navin-rai avatar Feb 01 '23 13:02 navin-rai

@navin-rai is your issue similar to this one: https://github.com/jaegertracing/helm-charts/issues/441 by any chance ? I don't use opensearch so I don't know what could go wrong with it when using it with jaeger deployed using this helm chart.

mehta-ankit avatar Feb 01 '23 13:02 mehta-ankit

@navin-rai is your issue similar to this one: #441 by any chance ? I don't use opensearch so I don't know what could go wrong with it when using it with jaeger deployed using this helm chart.

I tried the solution given in PR, but it didn't work.

navin-rai avatar Feb 01 '23 13:02 navin-rai

@navin-rai did you enable fine-grained-access on OpenSearch domain? If yes, then proper credentials must be provided, if not, then you can't pass username nor password as environment variables. Another thing is AWS level policies, can you confirm that pods running in your cluster are able to correctly resolve OpenSearch address?

klubi avatar Feb 01 '23 14:02 klubi

@klubi , Hi, So here is the thing what I am trying to do, I have AWS ES created, My jaeger instance is not on AWS it is on prem. The solution which you provided gives me below manifest for collector-deployment(similar for query-deployment)

Source: jaeger/templates/collector-deploy.yaml

apiVersion: apps/v1 kind: Deployment metadata: name: jaeger-collector labels: helm.sh/chart: jaeger-0.67.0 app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/version: "1.39.0" app.kubernetes.io/managed-by: Helm app.kubernetes.io/component: collector spec: selector: matchLabels: app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/component: collector template: metadata: annotations: checksum/config-env: dba5166ad9db9ba648c1032ebbd34dcd0d085b50023b839ef5c68ca1db93a563 labels: app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/component: collector spec: securityContext: {} serviceAccountName: jaeger-collector containers: - name: jaeger-collector securityContext: {} image: jaegertracing/jaeger-collector:1.39.0 imagePullPolicy: IfNotPresent args: env: - name: SPAN_STORAGE_TYPE value: elasticsearch - name: ES_SERVER_URLS value: https://search-elastic-****-***********.us-east-1.es.amazonaws.com:443 - name: ES_USERNAME value: elastic - name: ES_PASSWORD valueFrom: secretKeyRef: name: jaeger-elasticsearch key: password - name: ES_INDEX_PREFIX value: jaeger ports: - containerPort: 14250 name: grpc protocol: TCP - containerPort: 14268 name: http protocol: TCP - containerPort: 14269 name: admin protocol: TCP readinessProbe: httpGet: path: / port: admin initialDelaySeconds: 20 livenessProbe: httpGet: path: / port: admin initialDelaySeconds: 20 resources: {} volumeMounts: dnsPolicy: ClusterFirst restartPolicy: Always volumes:

navin-rai avatar Feb 01 '23 14:02 navin-rai

@klubi I am not sure, is there any possibility to use AWS Secret key & Access key ?

navin-rai avatar Feb 01 '23 14:02 navin-rai

No, that's a completely different mechanism. My PR was not merged yet, so you can't use it yet. What you can do to test your case is remove below lines from generated manifest.

- name: ES_USERNAME
  value: elastic
- name: ES_PASSWORD
  valueFrom:
    secretKeyRef:
      name: jaeger-elasticsearch

also, you'd have to add below to your collector values

cmdlineParams:
      es.tls.enabled: true
      es.tls.skip-host-verify: true

klubi avatar Feb 01 '23 14:02 klubi