jaeger-operator
jaeger-operator copied to clipboard
connect external es created by eck got error "health check timeout: no Elasticsearch node available"
error msg:
2020/12/16 04:02:17 maxprocs: Updating GOMAXPROCS=1: using minimum allowed GOMAXPROCS
{"level":"info","ts":1608091337.2560472,"caller":"flags/service.go:117","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1608091337.2560928,"caller":"flags/service.go:123","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
{"level":"info","ts":1608091337.2562084,"caller":"flags/admin.go:121","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1608091337.2562501,"caller":"flags/admin.go:127","msg":"Starting admin HTTP server","http-addr":":14269"}
{"level":"info","ts":1608091337.256265,"caller":"flags/admin.go:113","msg":"Admin server started","http.host-port":"[::]:14269","health-status":"unavailable"}
{"level":"fatal","ts":1608091342.4766536,"caller":"command-line-arguments/main.go:74","msg":"Failed to init storage factory","error":"failed to create primary Elasticsearch client: health check timeout: no Elasticsearch node available","stacktrace":"main.main.func1\n\tcommand-line-arguments/main.go:74\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.go:826\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:914\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:864\nmain.main\n\tcommand-line-arguments/main.go:133\nruntime.main\n\truntime/proc.go:204"}
external es 7.10.1 yaml file(es operator 1.3.1): https://www.elastic.co/guide/en/cloud-on-k8s/1.3/k8s-deploy-eck.html
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: jaeger-storage
spec:
version: 7.10.1
http:
tls:
selfSignedCertificate:
subjectAltNames:
- dns: jaeger-storage-es-http.observability.svc.cluster.local
- dns: jaeger-storage-es-http.observability.svc
- dns: jaeger-storage-es-http
nodeSets:
- name: default
count: 1
config:
node.roles:
- master
- data
node.store.allow_mmap: false
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: xxx
jaeger yaml file:
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: tds-jaeger-dev
spec:
strategy: production
collector:
maxReplicas: 3
resources:
limits:
cpu: 100m
memory: 128Mi
storage:
type: elasticsearch
options:
es:
# server-urls: https://jaeger-storage-es-http:9200
# server-urls: https://jaeger-storage-es-http.observability.svc:9200
server-urls: https://jaeger-storage-es-http.observability.svc.cluster.local:9200
tls: "true"
tls.ca: /es/certificates/ca.crt
tls.cert: /es/certificates/tls.crt
tls.key: /es/certificates/tls.key
secretName: jaeger-secret
volumeMounts:
- name: certificates
mountPath: /es/certificates/
readOnly: true
volumes:
- name: certificates
secret:
secretName: jaeger-storage-es-http-certs-internal
I have tried all these urls, but still got the same error
# server-urls: https://jaeger-storage-es-http:9200
# server-urls: https://jaeger-storage-es-http.observability.svc:9200
server-urls: https://jaeger-storage-es-http.observability.svc.cluster.local:9200
I have tried the same config follow this issuehttps://github.com/jaegertracing/jaeger-operator/issues/496, but still not work.
The error tell healthcheck timeout, so I use curl to test as follow
curl -X HEAD -u "elastic:fZdf2d80D238g0mEr03CDM3e" -k "https://172.21.10.94:9200" -v
* About to connect() to 172.21.10.94 port 9200 (#0)
* Trying 172.21.10.94...
* Connected to 172.21.10.94 (172.21.10.94) port 9200 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=jaeger-storage-es-http.observability.es.local,OU=jaeger-storage
* start date: 12月 16 03:39:31 2020 GMT
* expire date: 12月 16 03:49:31 2021 GMT
* common name: jaeger-storage-es-http.observability.es.local
* issuer: CN=jaeger-storage-http,OU=jaeger-storage
* Server auth using Basic with user 'elastic'
> HEAD / HTTP/1.1
> Authorization: Basic ZWxhc3RpYzpmWmRmMmQ4MEQyMzhnMG1FcjAzQ0RNM2U=
> User-Agent: curl/7.29.0
> Host: 172.21.10.94:9200
> Accept: */*
>
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 557
<
this way(-X HEAD), the response code is 200, but just got stuck, the request can't finish
curl --head -u "elastic:fZdf2d80D238g0mEr03CDM3e" -k "https://172.21.10.94:9200" -v
* About to connect() to 172.21.10.94 port 9200 (#0)
* Trying 172.21.10.94...
* Connected to 172.21.10.94 (172.21.10.94) port 9200 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=jaeger-storage-es-http.observability.es.local,OU=jaeger-storage
* start date: 12月 16 03:39:31 2020 GMT
* expire date: 12月 16 03:49:31 2021 GMT
* common name: jaeger-storage-es-http.observability.es.local
* issuer: CN=jaeger-storage-http,OU=jaeger-storage
* Server auth using Basic with user 'elastic'
> HEAD / HTTP/1.1
> Authorization: Basic ZWxhc3RpYzpmWmRmMmQ4MEQyMzhnMG1FcjAzQ0RNM2U=
> User-Agent: curl/7.29.0
> Host: 172.21.10.94:9200
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
content-type: application/json; charset=UTF-8
< content-length: 557
content-length: 557
<
* Connection #0 to host 172.21.10.94 left intact
this way(--head) finish successfully.
So I'm wondering if the problem is with healcheck method processing in es.
I am getting similar error while trying to connect from jaeger instance to elasticsearch created by elasticsearch redhat operator on openshift cloud platform 4.6
{"level":"fatal","ts":1610517606.3812678,"caller":"collector/main.go:70","msg":"Failed to init storage factory","error":"failed to create primary Elasticsearch client: health check timeout: no Elasticsearch node available","stacktrace":"main.main.func1\n\t/go/src/github.com/jaegertracing/jaeger/cmd/collector/main.go:70\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:762\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/jaegertracing/jaeger/cmd/collector/main.go:129\nruntime.main\n\t/opt/rh/go-toolset-1.14/root/usr/lib/go-toolset-1.14-golang/src/runtime/proc.go:203"}
here is my jaeger config
kind: Jaeger
metadata:
name: afjaeger
spec:
strategy: production
collector:
maxReplicas: 1
resources:
limits:
cpu: 400m
memory: 512Mi
query:
resources:
limits:
cpu: 256m
memory: 128Mi
agent:
strategy: DaemonSet
resources:
limits:
cpu: 256m
memory: 128Mi
storage:
type: elasticsearch
elasticsearch:
nodeCount: 3
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1
memory: 2Gi
options:
es:
server-urls: https://elasticsearch.openshift-logging.svc.cluster.local:9200
tls.ca: /es/certificates/tls.crt
secretName: jaeger-secret-ka
volumeMounts:
- name: certificates
mountPath: /es/certificates/
readOnly: true
volumes:
- name: certificates
secret:
secretName: test
I used admin-ca from elasticsearch secret here
tls.ca: /es/certificates/tls.crt
I hava the save issuses~
I have solved my problem @luxurine @priyavj08 I wrote the wrong config, the jaeger-secret should be ES_PASSWORD and ES_USERNAME, i wrote elastic=xxxx directly!
check the env info in the container, `$ env
$ elastic=235n718z40NkHPX3Sf6EjcyY123 ` the right way
$ kubectl create secret generic jaeger-secret --from-literal=ES_PASSWORD=changeme --from-literal=ES_USERNAME=elastic
`$ env
$ ES_USERNAME=elastic
$ ES_PASSWORD=Qkt5ij7wQhM75sa1q45YB178 `
maybe this is helpful to u @luxurine
I have solved my problem @luxurine @priyavj08 I wrote the wrong config, the jaeger-secret should be ES_PASSWORD and ES_USERNAME, i wrote elastic=xxxx directly!
check the env info in the container, `$ env
$ elastic=235n718z40NkHPX3Sf6EjcyY123 ` the right way
$ kubectl create secret generic jaeger-secret --from-literal=ES_PASSWORD=changeme --from-literal=ES_USERNAME=elastic
`$ env$ ES_USERNAME=elastic
$ ES_PASSWORD=Qkt5ij7wQhM75sa1q45YB178 `
maybe this is helpful to u @luxurine
I have checked my jaeger-secret secret, it seems correct.
@luxurine , given that you are using -k
with your curl call, I suspect the issue might be related to the TLS certs, not with the password. Depending on the Jaeger version that you are using, you can use a -debug image, like jaegertracing/jaeger-collector-debug:1.21.0
, and check whether the certs are correct. If you can make a curl call without the insecure mode set (-k
), send us the results and we can check from there.
I have the same issue, but I don't use self signed cert, it is valid so I don't need to provide ca. Also we don't use password authentication. Curl works without -k
option, and curator could reach to ES cluster.
I have the same issue and I figure it out. My es account permission doesn't have permission "monitor", so the Elasticsearch client of jaeger sends a HEAD /
request to do the health check but fails.
I have the same issue and I figure it out. My es account permission doesn't have permission "monitor", so the Elasticsearch client of jaeger sends a
HEAD /
request to do the health check but fails.
thank you this ended up being my issue as well, my jaeger secret was correct and i noticed in jaeger logs:
Failed to init storage factory","error":"failed to create primary Elasticsearch client: health check timeout: no Elasticsearch node available"
will let you know if giving my role in rbac monitor resolves the issue.
edit: yes that was it, thank you for the crucial input linw1995
Hi @perezjasonr @linw1995, Cloud you please let me know the detail what you solve this issue? I am not very clear how do i add permission like you mentioned.
Anybody know how to solve this issue, i am stuck it right now.
Btw @jpkrohling could you please help me?
I meet is also
jaeger-secret
@luxurine hope you have resolved the issue. For anyone else facing the issue, I seem to have suspected the reason for the issue. There is an incorrect indentation on the jaeger yaml file; secretName: jaeger-secret
is a sub option of storage
, and not storage.options
Hi @luxurine, We have resolved the issue. Thanks for your YAML files, they have helped us alot.
You just have one mistake in your jaeger YAML file. You have mentioned the secretName under Options i.e. sibling to es which is under Options in your YAMl file. The correct format of Jaeger YAML is secretName and Options should be siblings and they must be under Storage. Official documentation link --> https://docs.openshift.com/container-platform/4.4/jaeger/jaeger_install/rhbjaeger-deploying.html#jaeger-deploy-production_jaeger-deploying Here you can check the YAML format.
Points to be noted :-
- We have created separate secret file, named --> es-secret, in which we have provided the ES_USERNAME and ES_PASSWORD details, and passed that secret file name in jaeger YAML under storage.
- You have to provide http-certs-internal secret name under Volumes.
- Apart from that, if you are installing ElasticSearch and Jaeger in separate namespace then you will need to create es-secret and http-certs-internal secret files in jaeger namespace.
- Currently 8.x ElasticSearch version is not compatible with Jaeger, hence try to avoid using that version.
- Also we have tried installing 7.x version of ElasticSearch using helm but it was giving error. Hence we have installed the ElasticSearch using Operators (CRD). For that first you need to install ECK in your cluster, you can refer this link to install --> https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-deploy-eck.html
If you follow this steps, I am sure you will resolve the issue.
@harshalzambre-TEKsystems-scm @rishiravikumar-tul-scm Thanks a lot! It could be an error with the documentation, I just copy and edit YAML manifests at that time, and could not notice the error config format, but it's alright. Thank you again for your response.