ec2-macos-init icon indicating copy to clipboard operation
ec2-macos-init copied to clipboard

Password randomization failure blocks ssh key install

Open andrewhamon opened this issue 1 year ago • 4 comments
trafficstars

If I make an AMI where I create a password, subsequent runs of ec2-macos-init will fail before they ever get a chance to install the default ssh key.

Here is an example run:

2024/08/22 23:47:30.984552 Fetching instance ID from IMDS...
2024/08/22 23:47:30.987089 Running on instance i-0bf7783cd5199dc8d
2024/08/22 23:47:30.987130 Reading init config...
2024/08/22 23:47:30.989063 Successfully read init config
2024/08/22 23:47:30.989097 Validating config...
2024/08/22 23:47:30.989257 Successfully validated config
2024/08/22 23:47:30.989268 Prioritizing modules...
2024/08/22 23:47:30.989290 Successfully prioritized modules
2024/08/22 23:47:30.989299 Creating instance history directories for current instance...
2024/08/22 23:47:30.989585 Successfully created directories
2024/08/22 23:47:30.989598 Getting instance history...
2024/08/22 23:47:30.989782 Successfully gathered instance history
2024/08/22 23:47:30.989793 Processing priority level 1 (2 modules)...
2024/08/22 23:47:30.989819 Running module [UnmountLocalSSD] (type: command, group: 1)
2024/08/22 23:47:30.989834 Running module [DisableEthernet] (type: command, group: 1)
2024/08/22 23:47:31.037840 Successfully completed module [DisableEthernet] (type: command, group: 1) with message: successfully ran command [[/usr/sbin/networksetup -setnetworkserviceenabled Ethernet off]] with stdout [] and stderr []
2024/08/22 23:47:31.697122 Successfully completed module [UnmountLocalSSD] (type: command, group: 1) with message: successfully ran command [[/bin/zsh -c diskutil list internal physical | egrep -o '^/dev/disk\d+' | xargs diskutil eject || true]] with stdout [] and stderr [Volume failed to eject]
2024/08/22 23:47:31.698805 Successfully completed processing of priority level 1
2024/08/22 23:47:31.698834 Processing priority level 2 (1 modules)...
2024/08/22 23:47:31.698893 Running module [CheckNetworkIsUp] (type: networkcheck, group: 2)
2024/08/22 23:47:31.738534 Successfully completed module [CheckNetworkIsUp] (type: networkcheck, group: 2) with message: successfully pinged default gateway with a RTT of 266.667µs
2024/08/22 23:47:31.738626 Successfully completed processing of priority level 2
2024/08/22 23:47:31.738647 Processing priority level 3 (12 modules)...
2024/08/22 23:47:31.738696 Running module [GrowRootAPFSVolume] (type: command, group: 3)
2024/08/22 23:47:31.738839 Running module [NeverSleep] (type: command, group: 3)
2024/08/22 23:47:31.738848 Running module [ManageEC2User] (type: usermanagement, group: 3)
2024/08/22 23:47:31.738884 Running module [UpdateMOTD] (type: motd, group: 3)
2024/08/22 23:47:31.738950 Running module [SetDefaultTimezone] (type: command, group: 3)
2024/08/22 23:47:31.739207 Running module [EC2SuggestedDefaultConfigPerformance] (type: systemconfig, group: 3)
2024/08/22 23:47:31.739451 Running module [SetAmazonTimeSync] (type: command, group: 3)
2024/08/22 23:47:31.739488 Running module [NeverSleepDisplay] (type: command, group: 3)
2024/08/22 23:47:31.739657 Running module [DisableSleep] (type: command, group: 3)
2024/08/22 23:47:31.739639 Running module [EC2SuggestedDefaultConfigSecurity] (type: systemconfig, group: 3)
2024/08/22 23:47:31.739851 Running module [RemoveSSHGroup] (type: command, group: 3)
2024/08/22 23:47:31.740006 Running module [DisableWiFi] (type: command, group: 3)
2024/08/22 23:47:31.753394 Error while running module [GrowRootAPFSVolume] (type: command, group: 3) with message:  and err: ec2macosinit: error executing command [[/bin/zsh -c ec2-macos-utils grow --id root]] with stdout [] and stderr [zsh:1: command not found: ec2-macos-utils]: exit status 127
2024/08/22 23:47:31.762102 Did not modify sysctl property [kern.aioprocmax=256]
2024/08/22 23:47:31.762206 Did not modify sysctl property [net.inet.tcp.autorcvbufmax=33554432]
2024/08/22 23:47:31.766793 Did not modify sysctl property [kern.aiomax=900]
2024/08/22 23:47:31.766774 Did not modify sysctl property [net.inet.tcp.win_scale_factor=8]
2024/08/22 23:47:31.767872 Did not modify sysctl property [kern.aiothreads=64]
2024/08/22 23:47:31.768859 Did not modify sysctl property [net.inet.tcp.recvspace=1048576]
2024/08/22 23:47:31.769714 Did not modify sysctl property [net.inet.tcp.autosndbufmax=33554432]
2024/08/22 23:47:31.769886 Did not modify sysctl property [net.inet.tcp.sendspace=1048576]
2024/08/22 23:47:31.774912 Did not modify sysctl property [net.link.generic.system.rcvq_maxlen=1024]
2024/08/22 23:47:31.792731 Did not modify SSHD configuration
2024/08/22 23:47:31.848976 Did not modify default [ConfigDataInstall]
2024/08/22 23:47:31.849036 Did not modify default [AutomaticallyInstallMacOSUpdates]
2024/08/22 23:47:31.849161 Did not modify default [AutomaticDownload]
2024/08/22 23:47:31.849292 Did not modify default [AutomaticCheckEnabled]
2024/08/22 23:47:31.884556 Successfully completed module [EC2SuggestedDefaultConfigSecurity] (type: systemconfig, group: 3) with message: system configuration completed with [0 changed / 1 unchanged /0 error(s)] out of 1 requested changes
2024/08/22 23:47:31.890440 Successfully completed module [UpdateMOTD] (type: motd, group: 3) with message: successfully updated motd file [/etc/motd] with version string [macOS Sonoma 14.5]
2024/08/22 23:47:31.898016 Did not modify default [CriticalUpdateInstall]
2024/08/22 23:47:31.898050 Successfully completed module [EC2SuggestedDefaultConfigPerformance] (type: systemconfig, group: 3) with message: system configuration completed with [0 changed / 14 unchanged / 0 error(s)] out of 14 requested changes
2024/08/22 23:47:31.928354 Successfully completed module [SetDefaultTimezone] (type: command, group: 3) with message: successfully ran command [[systemsetup -settimezone GMT]] with stdout [Set TimeZone: GMT] and stderr [2024-08-22 23:47:31.927 systemsetup[10242:88077] ### Error:-99 File:/AppleInternal/Library/BuildRoots/91a344b1-f985-11ee-b563-fe8bc7981bff/Library/Caches/com.apple.xbs/Sources/Admin/InternetServices.m Line:379]
2024/08/22 23:47:31.934127 Successfully completed module [RemoveSSHGroup] (type: command, group: 3) with message: successfully ran command [[/bin/zsh -c dscl /Local/Default delete /Groups/com.apple.access_ssh || true]] with stdout [delete: Invalid Path] and stderr [<dscl_cmd> DS Error: -14009 (eDSUnknownNodeName)]
2024/08/22 23:47:31.961197 Successfully completed module [DisableWiFi] (type: command, group: 3) with message: successfully ran command [[/bin/zsh -c wifidevice="$(networksetup -listallhardwareports |grep -A 1 "Wi-Fi" | tail -n 1 | cut -d " " -f2)"; if [[ ! -z $wifidevice ]]; then networksetup -setairportpower $wifidevice off; fi]] with stdout [] and stderr []
2024/08/22 23:47:31.978330 Successfully completed module [NeverSleepDisplay] (type: command, group: 3) with message: successfully ran command [[sudo pmset -a displaysleep 0]] with stdout [] and stderr[]
2024/08/22 23:47:31.981452 Successfully completed module [DisableSleep] (type: command, group: 3) with message: successfully ran command [[sudo pmset -a disablesleep 1]] with stdout [] and stderr []
2024/08/22 23:47:31.997461 Successfully completed module [NeverSleep] (type: command, group: 3) with message: successfully ran command [[sudo pmset -a sleep 0]] with stdout [] and stderr []
2024/08/22 23:47:32.034087 Successfully completed module [SetAmazonTimeSync] (type: command, group: 3) with message: successfully ran command [[systemsetup -setusingnetworktime on -setnetworktimeserver 169.254.169.123]] with stdout [Network Time is already on.
setNetworkTimeServer: 169.254.169.123] and stderr [2024-08-22 23:47:32.033 systemsetup[10259:88082] ### Error:-99 File:/AppleInternal/Library/BuildRoots/91a344b1-f985-11ee-b563-fe8bc7981bff/Library/Caches/com.apple.xbs/Sources/Admin/InternetServices.m Line:379]
2024/08/22 23:47:32.111164 Error while running module [ManageEC2User] (type: usermanagement, group: 3) with message:  and err: ec2macosinit: failed to randomize password: ec2macosinit: unable to set secure password: ec2macosinit: failed to set ec2-user's password: exit status 67
2024/08/22 23:47:32.111209 Successfully completed processing of priority level 3
2024/08/22 23:47:32.111216 Writing instance history for instance i-0bf7783cd5199dc8d...
2024/08/22 23:47:32.133068 Successfully wrote instance history
2024/08/22 23:47:32.140094 Number of fatal retries (101) exceeded, exiting 0 to avoid infinite runs
2024/08/22 23:47:32.140113 Exiting after 1.152981375s due to failure in module [ManageEC2User] with FatalOnError set

It would be great if this failure was a soft failure, and other modules had a chance to complete.

I think the "bug" here is that a failure to set a password is considered a retry-able error and keeps retrying until the 100 retry limit. Then the program exists. I think its mainly bad luck/race conditions that this usually happens before the default ssh key can be installed.

andrewhamon avatar Aug 22 '24 23:08 andrewhamon