cloud-on-k8s
cloud-on-k8s copied to clipboard
fleet-server "failed to fetch elasticsearch version" - ECK install on OpenShift isn't working
Elasticsearch Version
Version: 8.15.2, Build: docker/98adf7bf6bb69b66ab95b761c9e5aadb0bb059a3/2024-09-19T10:06:03.564235954Z, JVM: 22.0.1
Installed Plugins
No response
Java Version
bundled
OS Version
OpenShift BareMetal
Problem Description
I have deployed ECK on OpenShift baremetal servers for POC. While I can get kibana dashboard, I cannot get fleet-server to start and work. I'm using default configuration (from these documentations https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-openshift-deploy-the-operator.html and https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-elastic-agent-fleet-quickstart.html) for the most part with little modifications where needed.
these are my manifests:
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: kibana-sample
spec:
version: 8.15.2
count: 1
elasticsearchRef:
name: "elasticsearch-sample"
podTemplate:
spec:
containers:
- name: kibana
resources:
limits:
memory: 1Gi
cpu: 1
config:
server.publicBaseUrl: "https://#######"
xpack.fleet.agents.elasticsearch.hosts: ["https://elasticsearch-sample-es-http.elastic.svc:9200"]
xpack.fleet.agents.fleet_server.hosts: ["https://fleet-server-sample-agent-http.elastic.svc:8220"]
xpack.fleet.packages:
- name: system
version: latest
- name: elastic_agent
version: latest
- name: fleet_server
version: latest
- name: apm
version: latest
xpack.fleet.agentPolicies:
- name: Fleet Server on ECK policy
id: eck-fleet-server
namespace: elastic
monitoring_enabled:
- logs
- metrics
unenroll_timeout: 900
package_policies:
- name: fleet_server-1
id: fleet_server-1
package:
name: fleet_server
- name: Elastic Agent on ECK policy
id: eck-agent
namespace: elastic
monitoring_enabled:
- logs
- metrics
unenroll_timeout: 900
package_policies:
- name: system-1
id: system-1
package:
name: system
---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch-sample
spec:
version: 8.15.2
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
index.store.type: niofs # https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-store.html
---
apiVersion: apm.k8s.elastic.co/v1
kind: ApmServer
metadata:
name:apm-server-sample
spec:
version: 8.15.2
count: 1
elasticsearchRef:
name: "elasticsearch-sample"
kibanaRef:
name: kibana-sample
podTemplate:
spec:
serviceAccountName: apm-server
Agent state:
oc get agents
NAME HEALTH AVAILABLE EXPECTED VERSION AGE
elastic-agent-sample green 3 3 8.15.2 138m
fleet-server-sample red 1 8.15.2 138m
oc describe agent fleet-server-sample
Name: fleet-server-sample
Namespace: elastic
Labels: <none>
Annotations: ###
API Version: agent.k8s.elastic.co/v1alpha1
Kind: Agent
Metadata: ###
Spec:
Deployment:
Pod Template:
Metadata:
Creation Timestamp: <nil>
Spec:
Automount Service Account Token: true
Containers: <nil>
Security Context:
Run As User: 0
Service Account Name: elastic-agent
Volumes:
Name: agent-data
Persistent Volume Claim:
Claim Name: fleet-server-sample
Replicas: 1
Strategy:
Elasticsearch Refs:
Name: elasticsearch-sample
Fleet Server Enabled: true
Fleet Server Ref:
Http:
Service:
Metadata:
Spec:
Tls:
Certificate:
Kibana Ref:
Name: kibana-sample
Mode: fleet
Policy ID: eck-fleet-server
Version: 8.15.2
Status:
Elasticsearch Associations Status:
elastic/elasticsearch-sample: Established
Expected Nodes: 1
Health: red
Kibana Association Status: Established
Observed Generation: 2
Version: 8.15.2
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning AssociationError 138m (x5 over 138m) agent-controller Association backend for elasticsearch is not configured
Warning AssociationError 138m (x9 over 138m) agent-controller Association backend for kibana is not configured
Normal AssociationStatusChange 138m agent-es-association-controller Association status changed from [] to [elastic/elasticsearch-sample: Established]
Normal AssociationStatusChange 138m agent-kibana-association-controller Association status changed from [] to [Established]
Warning Delayed 138m (x11 over 138m) agent-controller Delaying deployment of Elastic Agent in Fleet Mode as Kibana is not available yet
fleet-server pod error logs (which is in CrashLoopBackoff):
{"log.level":"error","@timestamp":"2024-10-14T16:35:35.550Z","message":"failed to fetch elasticsearch version","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"@timestamp":"2024-10-14T16:35:35.55Z","ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","error.message":"dial tcp [::1]:9200: connect: connection refused","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-10-14T16:35:35.551Z","message":"Failed Elasticsearch output configuration test, using bootstrap values.","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","error.message":"dial tcp [::1]:9200: connect: connection refused","output":{"hosts":["localhost:9200"],"protocol":"https","proxy_disable":false,"proxy_headers":{},"service_token":"#####","ssl":{"certificate_authorities":["/mnt/elastic-internal/elasticsearch-association/elastic/elasticsearch-sample/certs/ca.crt"],"verification_mode":"full"},"type":"elasticsearch"},"@timestamp":"2024-10-14T16:35:35.55Z","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:35.612Z","message":"panic: runtime error: invalid memory address or nil pointer dereference","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x55df2cba3217]","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"goroutine 279 [running]:","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"github.com/elastic/fleet-server/v7/internal/pkg/server.(*Agent).configFromUnits(0xc000002240, {0x55df2d489218, 0xc000486370})","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"/opt/buildkite-agent/builds/bk-agent-prod-aws-1726684516326467547/elastic/fleet-server-package-mbp/internal/pkg/server/agent.go:441 +0x97","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"github.com/elastic/fleet-server/v7/internal/pkg/server.(*Agent).start(0xc000002240, {0x55df2d489218, 0xc000486370})","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"/opt/buildkite-agent/builds/bk-agent-prod-aws-1726684516326467547/elastic/fleet-server-package-mbp/internal/pkg/server/agent.go:344 +0x51","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"github.com/elastic/fleet-server/v7/internal/pkg/server.(*Agent).reconfigure(0xc0002fd728?, {0x55df2d489218?, 0xc000486370?})","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"/opt/buildkite-agent/builds/bk-agent-prod-aws-1726684516326467547/elastic/fleet-server-package-mbp/internal/pkg/server/agent.go:387 +0x8d","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.013Z","message":"github.com/elastic/fleet-server/v7/internal/pkg/server.(*Agent).Run.func5()","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.013Z","message":"/opt/buildkite-agent/builds/bk-agent-prod-aws-1726684516326467547/elastic/fleet-server-package-mbp/internal/pkg/server/agent.go:204 +0x5c5","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.148Z","message":"created by github.com/elastic/fleet-server/v7/internal/pkg/server.(*Agent).Run in goroutine 1","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.148Z","message":"/opt/buildkite-agent/builds/bk-agent-prod-aws-1726684516326467547/elastic/fleet-server-package-mbp/internal/pkg/server/agent.go:162 +0x416","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.515Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":647},"message":"Component state changed fleet-server-default (STARTING->FAILED): Failed: pid '1214' exited with code '2'","log":{"source":"elastic-agent"},"component":{"id":"fleet-server-default","state":"FAILED","old_state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.515Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":665},"message":"Unit state changed fleet-server-default-fleet-server (STARTING->FAILED): Failed: pid '1214' exited with code '2'","log":{"source":"elastic-agent"},"component":{"id":"fleet-server-default","state":"FAILED"},"unit":{"id":"fleet-server-default-fleet-server","type":"input","state":"FAILED","old_state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.516Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":665},"message":"Unit state changed fleet-server-default (STARTING->FAILED): Failed: pid '1214' exited with code '2'","log":{"source":"elastic-agent"},"component":{"id":"fleet-server-default","state":"FAILED"},"unit":{"id":"fleet-server-default","type":"output","state":"FAILED","old_state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:45.612Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/cmd.logReturn","file.name":"cmd/run.go","file.line":162},"message":"2 errors occurred:\n\t* timeout while waiting for managers to shut down: no response from runtime manager, no response from vars manager\n\t* config manager: failed to initialize Fleet Server: context deadline exceeded\n\n","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
Error: 2 errors occurred:
* timeout while waiting for managers to shut down: no response from runtime manager, no response from vars manager
* config manager: failed to initialize Fleet Server: context deadline exceeded
From the logs it appears that fleet-server pod is looking for elasticsearch cluster at localhost instead of sending requests to elasticsearch service. There are other errors as well but I think this needs to be resolved first.
Errors in kibana pod:
[2024-10-14T16:17:47.714+00:00][ERROR][elasticsearch-service] Unable to retrieve version information from Elasticsearch nodes. Request timed out
Steps to Reproduce
Deploy ECK cluster using manifests mentioned above. Which are default for the most part with some changes.
Logs (if relevant)
No response