packer-maas icon indicating copy to clipboard operation
packer-maas copied to clipboard

When deploying custom-ubuntu.tar.gz, the machine stays with "Deploying" even after "Installation complete"

Open kojiwell opened this issue 1 year ago • 5 comments

When I make and then deploy a custom-ubuntu.tar.gz to a baremetal machine on our MAAS, the machine somehow stays "Deploying" even after "Installation complete - Node disabled netboot" is logged in the Event log. The machine doesn't respond to ping. And 30 minutes later, the machine will be marked as "Failed deployment."

I checked the machine's status via a remote console; the OS was up; I could log in. However, the /etc/netplan/50-cloud-init.yaml doesn't exist, which is supposed to be generated during the deployment. So, there seems to be an issue on finalizing the deployment process after the installed OS is booted.

Here's the snippet of the Event log around the end of deployment.


Mon, 20 May. 2024 22:36:56 | Node changed status - From 'Deploying' to 'Failed deployment' Mon, 20 May. 2024 22:36:56 | Marking node failed - Node operation 'Deploying' timed out after 30 minutes. Mon, 20 May. 2024 22:06:12 | Script result - /tmp/install.log changed status from 'Running' to 'Passed' Mon, 20 May. 2024 22:06:12 | Rebooting Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-power_state_change with frequency once-per-instance Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-final_message with frequency always Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-ssh_authkey_fingerprints with frequency once-per-instance Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-install_hotplug with frequency once-per-instance Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-keys_to_console with frequency once-per-instance Mon, 20 May. 2024 22:06:10 | Installation complete - Node disabled netboot


Here's the bottom lines in the Installation output. Curtin seems to have finished installation without an issue.

Saving to: ‘/dev/null’

     0K                                                       100%  142K=0s

2024-05-14 05:39:23 (142 KB/s) - ‘/dev/null’ saved [2/2]

curtin: Installation finished.

Here's the versions of related apps.

~# packer --version
Packer v1.10.0

~# snap info maas|grep installed
installed:          3.5.0~rc4-16292-g.18b753d78              (35434) 196MB -

packer-maas# git branch
* main

Here's how I have registered the ubuntu-custom image.

git clone https://github.com/canonical/packer-maas
cd packer-maas/ubuntu
make custom-ubuntu.tar.gz
maas $MAAS_USER boot-resources create name='custom/ubuntu2204' title='custom/ubuntu2204' architecture='amd64/generic' filetype='tgz' base_image='ubuntu/jammy' content@=custom-ubuntu.tar.gz

Thank you for your attention. I will appreciate any comments.

kojiwell avatar May 22 '24 02:05 kojiwell

can you run the following commands in the deployed machine?

  • cloud-init status --format=yaml
  • cloud-init query userdata

Also get the content of /var/log/cloud-init.log and /var/log/cloud-init-output.log

alexsander-souza avatar May 22 '24 14:05 alexsander-souza

Thank you, @alexsander-souza

Here're the snippets of the outputs you asked, all of which indicate that the machine is unable to communicate with the MAAS.

  • cloud-init status --format=yaml
---
_schema_version: '1'
boot_status_code: enabled-by-generator
datasource: none
detail: DataSourceNone
errors: []
extended_status: degraded done
init:
    errors: []
    finished: 1716416103.0340788
    recoverable_errors: {}
    start: 1716416102.9233449
init-local:
    errors: []
    finished: 1716416099.5163546
    recoverable_errors:
        WARNING:
        - 'Failed posting event: {"name": "init-local/check-cache", "description":
            "attempting to read from cache [trust]", "event_type": "start", "origin":
            "cloudinit", "timestamp": 1716416099.4795318}. This was caused by: HTTPConnectionPool(host=''172.16.40.251'',
            port=5248): Max retries exceeded with url: /MAAS/metadata/status/ea3yyb
            (Caused by NewConnectionError(''<urllib3.connection.HTTPConnection object
            at 0x7f4b7c4835b0>: Failed to establish a new connection: [Errno 101]
            Network is unreachable''))'
        - 'Failed posting event: {"name": "init-local/check-cache", "description":
            "restored from cache: DataSourceNone", "event_type": "finish", "origin":
            "cloudinit", "timestamp": 1716416099.482275, "result": "SUCCESS"}. This
            was caused by: HTTPConnectionPool(host=''172.16.40.251'', port=5248):
            Max retries exceeded with url: /MAAS/metadata/status/ea3yyb (Caused by
            NewConnectionError(''<urllib3.connection.HTTPConnection object at 0x7f4b7c4813f0>:
            Failed to establish a new connection: [Errno 101] Network is unreachable''))'
    start: 1716416099.4651775
last_update: Wed, 22 May 2024 22:15:06 +0000
 :
 :
  • cloud-init query userdata
#cloud-config
growpart:
  mode: 'off'
locale: en_US.UTF-8
preserve_hostname: true
resize_rootfs: false
ssh_pwauth: true
users:
- gecos: ubuntu
  groups: adm,cdrom,dip,lxd,plugdev,sudo
  lock_passwd: false
  name: ubuntu
  passwd: $6$canonical.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  shell: /bin/bash
  • /var/log/cloud-init.log
 :
 :
2024-05-22 22:15:06,302 - handlers.py[DEBUG]: Queuing POST to http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb, data: {'name': 'modules-final', 'description': 'running modules for final', 'event_type': 'finish', 'origin': 'cloudinit', 'timestamp': 1716416106.3022237, 'result': 'SUCCESS'}
2024-05-22 22:15:06,302 - handlers.py[DEBUG]: WebHookHandler flushing remaining events
2024-05-22 22:15:06,308 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,312 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,317 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,320 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,325 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,329 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,334 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,339 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,342 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,347 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,351 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,355 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,360 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,363 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,368 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,373 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,377 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,380 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,385 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,389 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,410 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
  • /var/log/cloud-init-output.log
 :
 :
2024-05-22 22:14:59,498 - handlers.py[WARNING]: Failed posting event: {"name": "init-local/check-cache", "description": "attempting to read from cache [trust]", "event_type": "start", "origin": "cloudinit", "timestamp": 1716416099.4795318}. This was caused by: HTTPConnectionPool(host='172.16.40.251', port=5248): Max retries exceeded with url: /MAAS/metadata/status/ea3yyb (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4b7c4835b0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
2024-05-22 22:14:59,498 - handlers.py[WARNING]: Failed posting event: {"name": "init-local/check-cache", "description": "restored from cache: DataSourceNone", "event_type": "finish", "origin": "cloudinit", "timestamp": 1716416099.482275, "result": "SUCCESS"}. This was caused by: HTTPConnectionPool(host='172.16.40.251', port=5248): Max retries exceeded with url: /MAAS/metadata/status/ea3yyb (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4b7c4813f0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
2024-05-22 22:14:59,517 - handlers.py[WARNING]: Failed posting event: {"name": "init-local", "description": "searching for local datasources", "event_type": "finish", "origin": "cloudinit", "timestamp": 1716416099.516621, "result": "SUCCESS"}. This was caused by: HTTPConnectionPool(host='172.16.40.251', port=5248): Max retries exceeded with url: /MAAS/metadata/status/ea3yyb (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4b7c3a98d0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
 :
 :

Please note that the host '172.16.40.251' is actually reachable because I added the following lines in packer-maas/ubuntu/scripts/install-custom-packages and created the custom image again to become able to ssh to the deployed machine.

cat << EOF > /etc/netplan/00-installer-config.yaml
network:
  ethernets:
    enp7s0f0:
      dhcp4: true
  version: 2
EOF

Here's the ping result.

ubuntu@ubuntu:~$ ping -c 3 172.16.40.251
PING 172.16.40.251 (172.16.40.251) 56(84) bytes of data.
64 bytes from 172.16.40.251: icmp_seq=1 ttl=64 time=0.764 ms
64 bytes from 172.16.40.251: icmp_seq=2 ttl=64 time=0.764 ms
64 bytes from 172.16.40.251: icmp_seq=3 ttl=64 time=0.672 ms

--- 172.16.40.251 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2048ms
rtt min/avg/max/mdev = 0.672/0.733/0.764/0.043 ms

This problem doesn't happen with the official Ubuntu image on MAAS, so there are some differences between the official and custom images (e.g. missing required apt packages, etc.)?

Thank you!

kojiwell avatar May 22 '24 23:05 kojiwell

By the way, I've also created and registered the custom rocky8 and 9 tar.gz images in the same way, which get deployed without a problem. So Rocky is fine.

This problem seems unique to the custom ubuntu (and maybe debian as well).

kojiwell avatar May 23 '24 05:05 kojiwell

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Jun 23 '24 01:06 github-actions[bot]

this was probably fixed by #242, please try again

alexsander-souza avatar Jun 25 '24 21:06 alexsander-souza

@alexsander-souza I just tried it again, and it looks like the problem is resolved. Thank you so much!

kojiwell avatar Aug 28 '24 08:08 kojiwell