[Bug] - systemd[1]: [email protected]: Deactivated successfully.
Describe the bug Pass day 28 one ec2 machine lost comunication. The machine instance was passed 2/3 check status, failed in " Instance reachability"
[ec2-user@ip-a-b-c-d ~]$ cat /etc/os-release NAME="Amazon Linux" VERSION="2023" ID="amzn" ID_LIKE="fedora" VERSION_ID="2023" PLATFORM_ID="platform:al2023" PRETTY_NAME="Amazon Linux 2023.6.20250303" ANSI_COLOR="0;33" CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2023" HOME_URL="https://aws.amazon.com/linux/amazon-linux-2023/" DOCUMENTATION_URL="https://docs.aws.amazon.com/linux/" SUPPORT_URL="https://aws.amazon.com/premiumsupport/" BUG_REPORT_URL="https://github.com/amazonlinux/amazon-linux-2023" VENDOR_NAME="AWS" VENDOR_URL="https://aws.amazon.com/" SUPPORT_END="2029-06-30"
Additional context journarctl --reverse...
Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646833]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for network/interfaces/macs/06:03:d9:4a:4b:f0/device-number Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646826]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for network/interfaces/macs/06:03:d9:4a:4b:f0/device-number Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646819]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for network/interfaces/macs/06:03:d9:4a:4b:f0/device-number Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646812]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for network/interfaces/macs/06:03:d9:4a:4b:f0/device-number Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646805]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for network/interfaces/macs/06:03:d9:4a:4b:f0/device-number Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646798]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for network/interfaces/macs/06:03:d9:4a:4b:f0/device-number Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646789]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for network/interfaces/macs/06:03:d9:4a:4b:f0/device-number Apr 27 23:22:06 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646781]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for network/interfaces/macs/06:03:d9:4a:4b:f0/device-number Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646766]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for network/interfaces/macs/06:03:d9:4a:4b:f0/device-number Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal amazon-ssm-agent[2213]: 2025-04-27 23:22:05.5374 INFO [CredentialRefresher] Sleeping for 5m0s before retrying retrieve credentials Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal amazon-ssm-agent[2213]: 2025-04-27 23:22:05.5374 ERROR [CredentialRefresher] Retrieve credentials produced error: no valid credentials could be retrieved for ec2 identity. Defau> Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646758]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for network/interfaces/macs/06:03:d9:4a:4b:f0/device-number Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal amazon-ssm-agent[2213]: 2025-04-27 23:22:05.5374 ERROR EC2RoleProvider Failed to connect to Systems Manager with SSM role credentials. error calling RequestManagedInstanceRoleTo> Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646750]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for network/interfaces/macs/06:03:d9:4a:4b:f0/device-number Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal amazon-ssm-agent[2213]: caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: network is unreachable Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal amazon-ssm-agent[2213]: caused by: RequestError: send request failed Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal amazon-ssm-agent[2213]: status code: 0, request id: Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal amazon-ssm-agent[2213]: caused by: : Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal amazon-ssm-agent[2213]: 2025-04-27 23:22:05.5373 ERROR [TokenRequestService] failed to retrieve instance identity role. Error: EC2MetadataError: failed to get IMDSv2 token and f> Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646742]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for network/interfaces/macs/06:03:d9:4a:4b:f0/device-number Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for mac Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for network/interfaces/macs/06:03:d9:4a:4b:f0/network-card Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646735]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal amazon-ssm-agent[2213]: 2025-04-27 23:22:05.3698 WARN EC2RoleProvider Failed to connect to Systems Manager with instance profile role credentials. Err: retrieved credentials fai> Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: [get_meta] Querying IMDS for mac Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3646668]: Starting configuration refresh for ens5 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): Connection to cloud failed (1 tries): 0xc0000001 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): Unable to connect to ts01-lanner-lion.cloudsink.net:443 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): Could not connect using DNS Fallback: c0000001 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): Connect: Unable to connect to 2a05:d014:45e:4e02:0001:0002:0003:0405:443 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): trying to connect to 2a05:d014:45e:4e02:0001:0002:0003:0405:443 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): SslConnect: 2a05:d014:45e:4e02:0001:0002:0003:0405:443 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): Failed to connect via dns bypass server: c0000001 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): Connect: Unable to connect to 3.121.238.86:443 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): connect() failed: 65 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): trying to connect to 3.121.238.86:443 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): SslConnect: 3.121.238.86:443 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): Could not connect directly: c0000001 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): Connect: Unable to resolve ts01-lanner-lion.cloudsink.net, getaddrinfo returned -2 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): trying to connect to ts01-lanner-lion.cloudsink.net:443 Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646726]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): SslConnect: Unable to connect to ts01-lanner-lion.cloudsink.net:443 via Application Proxy: c0000225 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): ConnectWithProxy: Unable to get application proxy host from CsConfig: c0000225 Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: Finished sysstat-collect.service - system activity accounting tool. Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): Could not retrieve DisableProxy value: c0000225 Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: sysstat-collect.service: Deactivated successfully. Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): SslConnect: ts01-lanner-lion.cloudsink.net:443 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: Starting update-motd.service - Dynamically Generate Message Of The Day... Apr 27 22:50:01 ip-a-b-c-d.eu-south-2.compute.internal chronyd[2254]: Can't synchronise: no selectable sources Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[3646703]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: Starting sysstat-collect.service - system activity accounting tool... Apr 27 22:43:35 ip-a-b-c-d.eu-south-2.compute.internal chronyd[2254]: Selected source 169.254.169.123 Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: Starting [email protected] - Refresh policy routes for ens5... Apr 27 22:43:34 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): ConnectToCloud starts Apr 27 22:37:07 ip-a-b-c-d.eu-south-2.compute.internal systemd-networkd[2055]: ens5: Failed Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect comm="systemd" exe="/usr/lib/systemd/systemd" hostna> Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect comm="systemd" exe="/usr/lib/systemd/systemd" hostn> Apr 27 22:42:34 ip-a-b-c-d.eu-south-2.compute.internal chronyd[2254]: Source 3.8.121.220 replaced with 35.178.178.65 (time.aws.com) Apr 27 23:22:04 ip-a-b-c-d.eu-south-2.compute.internal audit: BPF prog-id=1775 op=LOAD Apr 27 22:35:14 ip-a-b-c-d.eu-south-2.compute.internal systemd-networkd[2055]: ens5: Could not set DHCPv4 address: Connection timed out Apr 27 22:12:54 ip-a-b-c-d.eu-south-2.compute.internal chronyd[2254]: Can't synchronise: no selectable sources Apr 27 22:12:10 ip-a-b-c-d.eu-south-2.compute.internal falcon-sensor-bpf[3007796]: CrowdStrike(4): SSLSocket Disconnected from Cloud. Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=refresh-policy-routes@ens5 comm="systemd" exe="/usr/lib/systemd/syst> Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=refresh-policy-routes@ens5 comm="systemd" exe="/usr/lib/systemd/sys> Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: Finished [email protected] - Refresh policy routes for ens5. Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: [email protected]: Deactivated successfully. Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3645398]: Got IMDSv2 token from http://169.254.169.254/latest Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3645398]: [get_meta] Querying IMDS for network/interfaces/macs/06:03:d9:4a:4b:f0/local-ipv4s Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3645398]: Got IMDSv2 token from http://169.254.169.254/latest Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3645398]: [get_meta] Querying IMDS for mac Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3645398]: Got IMDSv2 token from http://169.254.169.254/latest Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3645398]: [get_meta] Querying IMDS for mac Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3645398]: Using existing cfgfile /run/systemd/network/70-ens5.network Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3645398]: Got IMDSv2 token from http://169.254.169.254/latest Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3645398]: [get_meta] Querying IMDS for mac Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3645398]: Got IMDSv2 token from http://169.254.169.254/latest Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3645398]: [get_meta] Querying IMDS for mac Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal ec2net[3645398]: Starting configuration refresh for ens5 Apr 27 22:04:19 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: Starting [email protected] - Refresh policy routes for ens5...
** How i Solved it** stop and start the instance day 29 is ok
Any suggestions or clarifications to what happened?
Today 10 June again rebbot machine:
journalctl --reverse: ... ... Jun 10 06:21:37 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: Failed to start [email protected] - Refresh policy routes for ens5. Jun 10 06:21:37 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: [email protected]: Failed with result 'exit-code'. Jun 10 06:21:37 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE ... ..
any suggestion?
Patched machine: https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.7.20250428.html https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.7.20250512.html https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.7.20250527.html
Hello @j-riobello2, from the logs you posted, amazon-ec2-net-utils is having difficulty getting the IMDS token in order to make IMDS requests. It's not able to get IMDS information which sets up the systemd network config and therefore results in a loss of connectivity.
From this ssm agent log, we see that it could not find the token and we can assume amazon-ec2-net-utils is having trouble as well. [email protected] is a service which is started by net-utils to refresh the information from IMDS and it is exiting because it can't reach the 169.254.169.254 endpoint.
Would there be any reason why the ip 169.25.4.169.254 is not reachable ? This ip is specifically used by AWS for IMDS and it is mandatory. You can read more about IMDS here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html#imds-considerations
Apr 27 23:22:05 ip-a-b-c-d.eu-south-2.compute.internal amazon-ssm-agent[2213]: caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: network is unreachable
Thanks @joeysk2012,
I think the problem may be related to some ingress/egress traffic servers that our security team has placed in the edge of this machine. It happened again yesterday:
---------------------------TRACE----------------------- Jun 26 04:49:21 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: Starting [email protected] - Refresh policy routes for ens5... Jun 26 04:47:28 ip-a-b-c-d.eu-south-2.compute.internal audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=refresh-policy-routes@ens5 comm="systemd" exe="/usr/lib/systemd/sys> Jun 26 04:47:28 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: Failed to start [email protected] - Refresh policy routes for ens5. Jun 26 04:47:28 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: [email protected]: Failed with result 'exit-code'. Jun 26 04:47:28 ip-a-b-c-d.eu-south-2.compute.internal systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE Jun 26 04:47:28 ip-a-b-c-d.eu-south-2.compute.internal ec2net[119326]: Unable to identify device-number for ens5 in IMDS Jun 26 04:47:28 ip-a-b-c-d.eu-south-2.compute.internal ec2net[119326]: Unable to identify device-number for ens5 after 60 attempts Jun 26 04:47:28 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[119771]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Jun 26 04:47:28 ip-a-b-c-d.eu-south-2.compute.internal ec2net[119326]: [get_meta] Querying IMDS for network/interfaces/macs/xx:xx:xx:xx:xx:xx/device-number Jun 26 04:47:28 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[119764]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Jun 26 04:47:28 ip-a-b-c-d.eu-south-2.compute.internal ec2net[119326]: [get_meta] Querying IMDS for network/interfaces/macs/xx:xx:xx:xx:xx:xx/device-number Jun 26 04:47:28 ip-a-b-c-d.eu-south-2.compute.internal setup-policy-routes[119757]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable Jun 26 04:47:28 ip-a-b-c-d.eu-south-2.compute.internal ec2net[119326]: [get_meta] Querying IMDS for network/interfaces/macs/xx:xx:xx:xx:xx:xx/device-number ---------------------------END TRACE-----------------------
and we patched the server again: sudo dnf upgrade --releasever=2023.7.20250609 -x gitlab-ee sudo dnf upgrade --releasever=2023.7.20250623 -x gitlab-ee
Patch 2023.7.20250609 mentions updating "amazon-ec2-net-utils-2.6.0-1.amzn2023.0.1"... although I think the problem stems from a communication issue with those ingress/egress servers.
Best regards.
I have the same problems. Session manager freezes in case of high ingress load and I get the same errors in the systemlogs.