amazon-linux-2023 icon indicating copy to clipboard operation
amazon-linux-2023 copied to clipboard

[Bug] - updating aws-cfn-bootstrap

Open danie-dejager opened this issue 1 year ago • 10 comments

Describe the bug When dnf is updating aws-cfn-bootstrap I see the following error:

 Running scriptlet: aws-cfn-bootstrap-2.0-29.amzn2023.noarch                                                                                                                    18/22 
  Cleanup          : aws-cfn-bootstrap-2.0-29.amzn2023.noarch                                                                                                                    18/22 
  Running scriptlet: aws-cfn-bootstrap-2.0-29.amzn2023.noarch                                                                                                                    18/22 
Failed to set unit properties on aws-cfn-bootstrap.service: Unit aws-cfn-bootstrap.service not found.

To Reproduce Steps to reproduce the behavior:

  1. update to 2023.4.20240611
  2. observe aws-cfn-bootstrap update

danie-dejager avatar Jun 11 '24 06:06 danie-dejager

The aws-cfn-bootstrap.service is created by cfn-init so if no cloud formation was done the unit file is missing. The RPM provides scripts for package upgrade and uninstall only. Run rpm -q --scripts aws-cfn-bootstrap for details.

On package upgrade it runs:

if [ $1 -ge 1 ] && [ -x "/usr/lib/systemd/systemd-update-helper" ]; then
    # Package upgrade, not uninstall
    /usr/lib/systemd/systemd-update-helper mark-restart-system-units aws-cfn-bootstrap.service || :
fi

thus Failed to set unit properties warning.

elsaco avatar Jun 12 '24 15:06 elsaco

Got this same problem, did't even notice it right away.

With next metadata update cfn got stuck with retry loop, not even running next steps.

Would have been bad to notice this in production.

margussipria avatar Jun 19 '24 12:06 margussipria

I've reached out to the relevant internal team about the issue.

stewartsmith avatar Jun 26 '24 18:06 stewartsmith

It sounds like there's two things being reported.

  • The original warning message: Elsaco's comment is correct about the source of the warning (systemd failing to find a service unit) but misattributes its cause. The package uses standard systemd macros to restart any relevant service. However, "aws-cfn-bootstrap.service" is defined. If you examine the scripts under /opt/aws/bin and/or pull the SRPM file you can see that neither cfn-init generates the service unit nor is it defined anywhere in the package. "cfn-hup.service" exists, but I will refrain from commenting if this is what was intended as other members of the team are responsible for this package. Either way, as Stewart noted above, the is a bug in our packaging and will be addressed in a future release.
  • CloudFormation retry loop: The systemd message is a benign warning ("there's nothing for me to work with so I won't do anything"). Despite the confusion above about systemd behavior & unit names, an install/upgrade/reinstall/uninstall operation will complete successfully after displaying the message. Any bootstrap failure and retry loop CloudFormation is encountering is a result of a different problem. @margussipria do you have any logs or configuration you can share to help identify what's what's failing and causing CloudFormation to retry?

LordAlfredo avatar Jun 26 '24 22:06 LordAlfredo

we have had this code basically 8 years, from Amazon Linux 1: cloudformation.template.txt

example of that is still up in example document. https://s3.amazonaws.com/cloudformation-templates-us-east-1/LAMP_Single_Instance.template (json)

It has worked 8+ years, but now YumUpdate parameter is --releasever for yum. But this version broke everything.

  1. Loop happened because it tried still to run this part, while cfn-hup was missing.
          services:
            sysvinit:
              cfn-hup:
                enabled: true
                ensureRunning: true
                files:
                  - /etc/cfn/cfn-hup.conf
                  - /etc/cfn/hooks.d/cfn-auto-reloader.conf
                  - /etc/cfn/hooks.d/update.conf

when removing those lines updates worked worked until first boot. 2. after boot updating instance metadata has no effect.

margussipria avatar Jun 27 '24 12:06 margussipria

basically

$ systemctl enable cfn-hup
Failed to enable unit: Unit file cfn-hup.service does not exist.

$ yum downgrade aws-cfn-bootstrap-2.0-29.amzn2023.noarch
# download log
$ systemctl enable cfn-hup
$ echo $?
0

$ yum update -y
# download log
$ systemctl enable cfn-hup
Failed to enable unit: Unit file cfn-hup.service does not exist.
$ echo $?
1

margussipria avatar Jun 27 '24 13:06 margussipria

cfn-hup unit is present in aws-cfn-bootstrap-2.0-30.amzn2023.noarch package:

[ec2-user@i-0c44ccb32089a355e ~]$ rpm -qf /opt/aws/apitools/cfn-init/init/systemd/cfn-hup.service
aws-cfn-bootstrap-2.0-30.amzn2023.noarch
[ec2-user@i-0c44ccb32089a355e ~]$ systemctl status --no-pager cfn-hup
○ cfn-hup.service - cfn-hup daemon
     Loaded: loaded (/usr/lib/systemd/system/cfn-hup.service; disabled; preset: disabled)
     Active: inactive (dead)

Jun 27 15:46:38 i-0c44ccb32089a355e.ec2.internal systemd[1]: /usr/lib/systemd/system/cfn-hup.service:6: PIDFile= references a path below legacy directory /var/run/, updating /var/run/cfn-hup.pid → /run/cfn-hup.pid; please update the unit file accordingly.
Jun 27 15:55:15 i-0c44ccb32089a355e.ec2.internal systemd[1]: /usr/lib/systemd/system/cfn-hup.service:6: PIDFile= references a path below legacy directory /var/run/, updating /var/run/cfn-hup.pid → /run/cfn-hup.pid; please update the unit file accordingly.
Jun 27 15:55:23 i-0c44ccb32089a355e.ec2.internal systemd[1]: /usr/lib/systemd/system/cfn-hup.service:6: PIDFile= references a path below legacy directory /var/run/, updating /var/run/cfn-hup.pid → /run/cfn-hup.pid; please update the unit file accordingly.

It is also present in aws-cfn-bootstrap-2.0-29 package:

[ec2-user@i-0c44ccb32089a355e ~]$ rpm -qpl aws-cfn-bootstrap-2.0-29.amzn2023.noarch.rpm  | grep cfn-hup
/opt/aws/apitools/cfn-init-2.0-29/bin/cfn-hup
/opt/aws/apitools/cfn-init-2.0-29/init/redhat/cfn-hup
/opt/aws/apitools/cfn-init-2.0-29/init/systemd/cfn-hup.service
/opt/aws/apitools/cfn-init-2.0-29/init/ubuntu/cfn-hup
/opt/aws/bin/cfn-hup
/usr/bin/cfn-hup
/usr/lib/systemd/system/cfn-hup.service

elsaco avatar Jun 27 '24 16:06 elsaco

Correct - I'm conferring with the package maintainer if cfn-hup.service was their intended systemd target in the package since the pre/post scriptlets currently point to aws-cfn-bootstrap.service. The latter isn't defined anywhere, hence the warning message.

Margus's error appears to be a result of version upgrade from 2.0.29 to 2.0.30 removing cfn-hup entirely. We will investigate further.

LordAlfredo avatar Jun 27 '24 18:06 LordAlfredo

$ uname -a
Linux host-test1.localdomain 6.1.92-99.174.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jun  4 15:43:46 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

original ami used for this machine was ami-04fe22dfadec6f0b6 (eu-west-1) every ami for Amazon Linux 2023 has worked, also with upgrades, except these upgrades:

$ history | grep release
    4  dnf upgrade --releasever=2023.4.20240611
   36  dnf upgrade --releasever=2023.5.20240624
[root@(host) ~]$ rpm -qf /opt/aws/apitools/cfn-init/init/systemd/cfn-hup.service
aws-cfn-bootstrap-2.0-30.amzn2023.noarch
[root@(host) ~]$ systemctl status --no-pager cfn-hup
Unit cfn-hup.service could not be found.

and with downgrade:

[root@(host) ~]$ rpm -qf /opt/aws/apitools/cfn-init/init/systemd/cfn-hup.service
aws-cfn-bootstrap-2.0-29.amzn2023.noarch
[root@(host) ~]$ systemctl status --no-pager cfn-hup
○ cfn-hup.service - cfn-hup daemon
     Loaded: loaded (/etc/systemd/system/cfn-hup.service; enabled; preset: disabled)
     Active: inactive (dead)

Jun 27 09:24:24 host-test1.localdomain systemd[1]: cfn-hup.service: Failed to open /etc/systemd/system/cfn-hup.service: No such file or directory
Jun 27 10:19:24 host-test1.localdomain systemd[1]: cfn-hup.service: Failed to open /etc/systemd/system/cfn-hup.service: No such file or directory
Jun 27 10:19:25 host-test1.localdomain systemd[1]: cfn-hup.service: Failed to open /etc/systemd/system/cfn-hup.service: No such file or directory
Jun 27 19:41:30 host-test1.localdomain systemd[1]: /etc/systemd/system/cfn-hup.service:6: PIDFile= references a path below legacy directory /var/run/, updating /var/run/cfn-hup.pid → /run/cfn-hup.pid; please update the unit file accordingly.

margussipria avatar Jun 27 '24 19:06 margussipria

From what i was able to see the issue is simply an symbolic link not updated:

On a system initialized with aws-cfn-bootstrap-2.0-29.amzn2023.noarch we see the following link in the enabled service:

lrwxrwxrwx. 1 root root 62 Apr 3 07:41 /etc/systemd/system/cfn-hup.service -> /opt/aws/apitools/cfn-init-2.0-29/init/systemd/cfn-hup.service

While with aws-cfn-bootstrap-2.0-30.amzn2023.noarch:

lrwxrwxrwx. 1 root root 62 Aug 22 14:55 /etc/systemd/system/cfn-hup.service -> /opt/aws/apitools/cfn-init-2.0-30/init/systemd/cfn-hup.service

Funnily enough in both case the source service is pointing to a generic link but systemctl is resolving the link somehow:

lrwxrwxrwx. 1 root root 67 Apr 30 20:01 /lib/systemd/system/cfn-hup.service -> ../../../..//opt/aws/apitools/cfn-init/init/systemd/cfn-hup.service

Previous version like aws-cfn-bootstrap-2.0-23.amzn2023.noarch didn't even had the service link so when updating from thoses the service is not re-enabled either

Simplest workaround in the meantime is to rm the enabled dangling link and re-enable the service:

# /bin/rm /etc/systemd/system/cfn-hup.service
# systemctl enable --now cfn-hup
Created symlink /etc/systemd/system/cfn-hup.service → /opt/aws/apitools/cfn-init-2.0-30/init/systemd/cfn-hup.service.
Created symlink /etc/systemd/system/multi-user.target.wants/cfn-hup.service → /opt/aws/apitools/cfn-init-2.0-30/init/systemd/cfn-hup.service.

bplessis-swi avatar Aug 23 '24 07:08 bplessis-swi