sumaform icon indicating copy to clipboard operation
sumaform copied to clipboard

Print cloud-init logs when a deployment fails

Open srbarrios opened this issue 2 years ago • 8 comments

@rjmateus Can we print cloud-init logs when a deployment fails in that way?

Maybe we must contribute in the libvirt provider for it? Somewhere here? https://github.com/dmacvicar/terraform-provider-libvirt/blob/main/libvirt/resource_libvirt_domain.go https://github.com/dmacvicar/terraform-provider-libvirt/blob/main/libvirt/resource_libvirt_cloud_init.go https://github.com/dmacvicar/terraform-provider-libvirt/blob/main/libvirt/cloudinit_def.go

The idea is to append it to this error message:

[31m│[0m [0m[1m[31mError: [0m[0m[1mError: couldn't retrieve IP address of domain.Please check following: 
[31m│[0m [0m1) is the domain running proplerly? 
[31m│[0m [0m2) has the network interface an IP address? 
[31m│[0m [0m3) Networking issues on your libvirt setup? 
[31m│[0m [0m 4) is DHCP enabled on this Domain's network? 
[31m│[0m [0m5) if you use bridge network, the domain should have the pkg qemu-agent installed 
[31m│[0m [0mIMPORTANT: This error is not a terraform libvirt-provider error, but an error caused by your KVM/libvirt infrastructure configuration/setup 
[31m│[0m [0m timeout while waiting for state to become 'all-addresses-obtained' (last state: 'waiting-addresses', timeout: 5m0s)[0m
[31m│[0m [0m
[31m│[0m [0m[0m  with module.cucumber_testsuite.module.debian-minion.module.minion.module.host.libvirt_domain.domain[0],
[31m│[0m [0m  on /home/jenkins/jenkins-build/workspace/manager-Head-dev-acceptance-tests-NUE/results/sumaform/backend_modules/libvirt/host/main.tf line 68, in resource "libvirt_domain" "domain":
[31m│[0m [0m  68: resource "libvirt_domain" "domain" [4m{[0m[0m

srbarrios avatar Mar 04 '22 12:03 srbarrios

You mean that we have colored output?

nodeg avatar Mar 04 '22 12:03 nodeg

You mean that we have colored output?

hahaha nono I want the files /var/log/cloud-init.log and /var/log/cloud-init-output.log printed as part of the sumaform deployment, together with the message that I shared.

These files can give as very useful information about the deployment failure.

(I will not oppose to colors in any case :laughing: )

srbarrios avatar Mar 04 '22 12:03 srbarrios

See an example of the information they provide:

Cloud-init v. 20.2-8.48.1 running 'modules:config' at Mon, 28 Feb 2022 10:09:23 +0000. Up 32.66 seconds.
Retrieving repository 'os_pool_repo' metadata [.done]
Building repository 'os_pool_repo' cache [....done]
All repositories have been refreshed.
Loading repository data...
Reading installed packages...
'qemu-guest-agent' is already installed.
Package 'qemu-guest-agent' is not available in your repositories. Cannot reinstall, upgrade, or downgrade.
Resolving package dependencies...

The following 5 NEW packages are going to be installed:
  avahi libavahi-common3 libavahi-core7 libdaemon0 nss-mdns

5 new packages to install.
Overall download size: 348.8 KiB. Already cached: 0 B. After the operation, additional 894.7 KiB will be used.
Continue? [y/n/v/...? shows all options] (y): y
Retrieving package libavahi-common3-0.8-150400.4.4.x86_64 (1/5),  42.3 KiB ( 51.2 KiB unpacked)
Retrieving: libavahi-common3-0.8-150400.4.4.x86_64.rpm [done]
Retrieving package libdaemon0-0.14-1.23.x86_64 (2/5),  29.4 KiB ( 59.4 KiB unpacked)
Retrieving: libdaemon0-0.14-1.23.x86_64.rpm [done]
Retrieving package libavahi-core7-0.8-150400.4.4.x86_64 (3/5), 101.5 KiB (220.8 KiB unpacked)
Retrieving: libavahi-core7-0.8-150400.4.4.x86_64.rpm [done]
Retrieving package avahi-0.8-150400.4.4.x86_64 (4/5), 136.0 KiB (431.2 KiB unpacked)
Retrieving: avahi-0.8-150400.4.4.x86_64.rpm [done]
Retrieving package nss-mdns-0.14.1-150400.8.3.x86_64 (5/5),  39.7 KiB (132.0 KiB unpacked)
Retrieving: nss-mdns-0.14.1-150400.8.3.x86_64.rpm [done]

Checking for file conflicts: [.......done]
(1/5) Installing: libavahi-common3-0.8-150400.4.4.x86_64 [.....done]
(2/5) Installing: libdaemon0-0.14-1.23.x86_64 [.....done]
(3/5) Installing: libavahi-core7-0.8-150400.4.4.x86_64 [..........done]
(4/5) Installing: avahi-0.8-150400.4.4.x86_64 [...........done]
Additional rpm output:
Updating /etc/sysconfig/avahi ...
Created symlink /etc/systemd/system/dbus-org.freedesktop.Avahi.service -> /usr/lib/systemd/system/avahi-daemon.service.
Created symlink /etc/systemd/system/multi-user.target.wants/avahi-daemon.service -> /usr/lib/systemd/system/avahi-daemon.service.
Created symlink /etc/systemd/system/sockets.target.wants/avahi-daemon.socket -> /usr/lib/systemd/system/avahi-daemon.socket.


(5/5) Installing: nss-mdns-0.14.1-150400.8.3.x86_64 [..........done]
Failed to start qemu-ga@virtio\x2dports-org.qemu.guest_agent.0.service: Unit qemu-ga@virtio\x2dports-org.qemu.guest_agent.0.service not found.
Cloud-init v. 20.2-8.48.1 running 'modules:final' at Mon, 28 Feb 2022 10:09:25 +0000. Up 34.03 seconds.
2022-02-28 10:09:31,188 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/runcmd [5]
2022-02-28 10:09:31,195 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2022-02-28 10:09:31,196 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3.6/site-packages/cloudinit/config/cc_scripts_user.py'>) failed
ci-info: no authorized SSH keys fingerprints found for user sles.
Cloud-init v. 20.2-8.48.1 finished at Mon, 28 Feb 2022 10:09:31 +0000. Datasource DataSourceNoCloud [seed=/dev/sr0][dsmode=net].  Up 40.28 seconds

srbarrios avatar Mar 04 '22 12:03 srbarrios

@srbarrios we don't control the log from the sumaform side. That would need changes to the terraform libvirt provider. I'm not sure if this issue should be reported here. Upstream may not accept a change like this, because this is not a domain creation problem, is a cloud-init problem, inside the machine (I don't even know if the provider can get data from the disk inside the machine). As far as I know, all data the provides retrieve is obtained from libvirt.

rjmateus avatar Mar 10 '22 17:03 rjmateus

@rjmateus it's true we do not control it, still we could just echo it from salt script (if ever we get to that stage)

moio avatar Mar 14 '22 09:03 moio

@moio Good point. Do you think we should echo that always, and should by the first state to apply (since cloud-init runs at start-up)?

rjmateus avatar Mar 14 '22 11:03 rjmateus

Please try and take a look, if you find it too verbose it could be behind a flag.

moio avatar Mar 16 '22 09:03 moio

I'm not sure this is possible at libvirt provider side: it has no idea whether cloud-init is used or not.

Even with the assumption that cloud-init is used, the provider would need access inside the VM to check for the logs. And we are precisely in a situation where it is impossible to access inside the VM, be it via the network or via qemu-agent...

Bischoff avatar Mar 31 '22 08:03 Bischoff