beats
beats copied to clipboard
Elastic Agent won't start if the default rpc port is used
- Version: Elastic Agent 7.14.2
- Operating System: Ubuntu 20.04
- Steps to Reproduce:
Elastic Agent will bind to port 6789 by default, but when that port is used by another application, agent cannot be started.
Enrollment/installation (via fleet server) doesn't return any error, only INFO messages:
root@server:~/elastic-agent-7.14.2-linux-x86_64# ./elastic-agent install -f --url=https://URL:443 --enrollment-token=TOKEN
2021-09-22T18:40:00.862+0200 INFO cmd/enroll_cmd.go:396 Starting enrollment to URL: https://URL:443/
2021-09-22T18:40:02.024+0200 INFO cmd/enroll_cmd.go:232 Elastic Agent might not be running; unable to trigger restart
2021-09-22T18:40:02.024+0200 INFO cmd/enroll_cmd.go:234 Successfully triggered restart on running Elastic Agent.
Successfully enrolled the Elastic Agent.
Elastic Agent has been successfully installed.
The messages indicate that the agent has been restarted, enrolled and installed... But agent is not running and all we see in Kibana is that the agent is "Updating"
Only journalctl shows the real problem:
sep 22 18:52:05 server systemd[1]: Started Elastic Agent is a unified agent to observe, monitor and protect your system..
sep 22 18:52:05 server elastic-agent[2085818]: starting GRPC listener: listen tcp 127.0.0.1:6789: bind: address already in use
sep 22 18:52:05 server systemd[1]: elastic-agent.service: Main process exited, code=exited, status=1/FAILURE
sep 22 18:52:05 server systemd[1]: elastic-agent.service: Failed with result 'exit-code'
So we have to edit elastic-agent.yml under /opt/Elastic/Agent, and add a different grpc port:
agent.grpc:
address: localhost
port: 16789
And then run /opt/Elastic/Agent/elastic-agent restart
Elastic Agent should at least identify this port collision during the installation and display an error message warning user about the problem
cc: @EricDavisX
Pinging @elastic/agent (Team:Agent)
At a minimum, we can probably detect this and put better error logging in place to help triage.
Hi! We just realized that we haven't looked into this issue in a while. We're sorry!
We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!
👍
This is causing issues for my org when a kubernetes deployment of a standalone agent on a cluster with a daemonset running on each node gets this error.
👍
This is causing issues for my org when a kubernetes deployment of a standalone agent on a cluster with a daemonset running on each node gets this error.
After going line by line, the deployment and Daemonset cannot use the hostNetwork: True flag
Spec:
hostNetwork: true
Changed to False solved this error on our deployment to a Kubernetes cluster with the Daemonset already running.
Possible documentation to prevent others from running into it?
_Per this page: https://www.elastic.co/guide/en/fleet/master/running-on-kubernetes-managed-by-fleet.html
Deploying Elastic Agent to collect cluster-level metrics in large cluster The size and the number of nodes in a Kubernetes cluster can be fairly large at times, and in such cases the Pod that will be collecting cluster level metrics might face performance issues due to resources limitations. In this case users might consider to avoid using the leader election strategy and instead run a dedicated, standalone Elastic Agent instance using a Deployment in addition to the DaemonSet._
👍 This is causing issues for my org when a kubernetes deployment of a standalone agent on a cluster with a daemonset running on each node gets this error.
After going line by line, the deployment and Daemonset cannot use the hostNetwork: True flag
Spec: hostNetwork: trueChanged to False solved this error on our deployment to a Kubernetes cluster with the Daemonset already running.
Possible documentation to prevent others from running into it?
_Per this page: https://www.elastic.co/guide/en/fleet/master/running-on-kubernetes-managed-by-fleet.html
Deploying Elastic Agent to collect cluster-level metrics in large cluster The size and the number of nodes in a Kubernetes cluster can be fairly large at times, and in such cases the Pod that will be collecting cluster level metrics might face performance issues due to resources limitations. In this case users might consider to avoid using the leader election strategy and instead run a dedicated, standalone Elastic Agent instance using a Deployment in addition to the DaemonSet._
Thanks for this, saved us a bunch of time, we wanted to run synthetics browser monitors but as the normal DaemonSet requires runAsUser: 0 and synthetics requires runAsUser: 1000 we needed to combine hostNetwork: false and runAsUser: 1000 for that to work. Thanks
~Still experiencing this with v8.5.3 and editing /opt/Elastic/Agent/elastic-agent.reference.yml doesn't work as the installation seems to have failed (Fleet says the agent status is "updating") and a restart just throws the following socket error:~
$ sudo /opt/Elastic/Agent/elastic-agent restart
Error: Failed trigger restart of daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /run/elastic-agent.sock: connect: no such file or directory"
Usage:
elastic-agent restart [flags]
Flags:
-h, --help help for restart
Global Flags:
-c, --c string Configuration file, relative to path.config (default "elastic-agent.yml")
-d, --d string Enable certain debug selectors
-e, --e Log to stderr and disable syslog/file output
--environment environmentVar set environment being ran in (default default)
--path.config string Config path is the directory Agent looks for its config file (default "/opt/Elastic/Agent")
--path.downloads string Downloads path contains binaries Agent downloads
--path.home string Agent root path (default "/opt/Elastic/Agent")
--path.install string Install path contains binaries Agent extracts
--path.logs string Logs path contains Agent log output (default "/opt/Elastic/Agent")
-v, --v Log at INFO level
Failed trigger restart of daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /run/elastic-agent.sock: connect: no such file or directory"
EDIT: Just noticed I changed the wrong file: /opt/Elastic/Agent/elastic-agent.yml is the right one, and the suggested change from the original post works. But, sudo elastic-agent restart didn't work for me. However, sudo systemctl restart elastic-agent did.
Hi! We just realized that we haven't looked into this issue in a while. We're sorry!
We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!