xen-orchestra
xen-orchestra copied to clipboard
Cloud-init NoCloud block device has wrong dosfslabel --> cloud-init can't find datasource and exits
Context
- XO origin: the sources
-
Versions:
- Node: v8.16.0
- xo-web: 5.48
- xo-server: 5.48
Expected behavior
When using a cloud-init template, the NoCloud datasource expects the block device with the meta-data and user-data folders to have the dosfslabel of "cidata", e.g.:
debian@cloudbuster:~$ sudo blkid /dev/xvdb
/dev/xvdb: SEC_TYPE="msdos" LABEL="cidata" UUID="355A-4FC2" TYPE="vfat"
When booting with a correctly labeled disk, cloud-init will detect the disk correctly and process the meta-data and user-data files during boot and make necessary changes.
Current behavior
The generated block device xvdb looks like this:
debian@cloudbuster:~$ sudo blkid /dev/xvdb
/dev/xvdb: SEC_TYPE="msdos" LABEL_FATBOOT="cidata" UUID="355A-4FC2" TYPE="vfat"
This leads to /usr/lib/cloud-init/ds-identify
failing to detect the NoCloud datasource and cloud-init exits with status code 1.
debian@cloudbuster:~$ sudo /usr/lib/cloud-init/ds-identify
debian@cloudbuster:~$ cat /run/cloud-init/ds-identify.log
[up 10.71s] ds-identify
policy loaded: mode=search report=false found=all maybe=all notfound=disabled
no datasource_list found, using default: MAAS ConfigDrive NoCloud AltCloud Azure Bigstep CloudSigma CloudStack DigitalOcean AliYun Ec2 GCE OpenNebula OpenStack OVF SmartOS Scaleway Hetzner IBMCloud
DMI_PRODUCT_NAME=HVM domU
DMI_SYS_VENDOR=Xen
DMI_PRODUCT_SERIAL=75995498-da6b-bae6-7b49-2bcf94a51bc1
DMI_PRODUCT_UUID=75995498-da6b-bae6-7b49-2bcf94a51bc1
PID_1_PRODUCT_NAME=unavailable
DMI_CHASSIS_ASSET_TAG=
FS_LABELS=
ISO9660_DEVS=
KERNEL_CMDLINE=BOOT_IMAGE=/boot/vmlinuz-4.19.0-5-amd64 root=UUID=98709c7c-e899-43a8-becd-53adfee9b81a ro quiet
VIRT=xen
UNAME_KERNEL_NAME=Linux
UNAME_KERNEL_RELEASE=4.19.0-5-amd64
UNAME_KERNEL_VERSION=#1 SMP Debian 4.19.37-5+deb10u2 (2019-08-08)
UNAME_MACHINE=x86_64
UNAME_NODENAME=cloudbuster
UNAME_OPERATING_SYSTEM=GNU/Linux
DSNAME=
DSLIST=MAAS ConfigDrive NoCloud AltCloud Azure Bigstep CloudSigma CloudStack DigitalOcean AliYun Ec2 GCE OpenNebula OpenStack OVF SmartOS Scaleway Hetzner IBMCloud
MODE=search
ON_FOUND=all
ON_MAYBE=all
ON_NOTFOUND=disabled
pid=240 ppid=224
is_container=false
is_ds_enabled(IBMCloud) = true.
ec2 platform is 'Unknown'.
is_ds_enabled(IBMCloud) = true.
No ds found [mode=search, notfound=disabled]. Disabled cloud-init [1]
[up 11.36s] returning 1
If I then manually set the correct label, e.g. with dosfslabel /dev/xvdb cidata
, clean up cloud-inits tempfiles (sudo cloud-init clean && rm -rf /run/cloud-init && sudo rm -rf /var/lib/cloud/
) and reboot the machine, cloud-init detects the now properly labeled device and does its magic.
debian@cloudbuster:~$ sudo dosfslabel /dev/xvdb cidata
fatlabel: warning - lowercase labels might not work properly with DOS or Windows
debian@cloudbuster:~$ sudo blkid /dev/xvdb
/dev/xvdb: SEC_TYPE="msdos" LABEL_FATBOOT="cidata" LABEL="cidata" UUID="355A-4FC2" TYPE="vfat"
debian@cloudbuster:~$ sudo rm -rf /var/lib/cloud/
debian@cloudbuster:~$ sudo rm -rf /run/cloud-init/
debian@cloudbuster:~$ sudo cloud-init clean
debian@cloudbuster:~$ sudo /usr/lib/cloud-init/ds-identify
debian@cloudbuster:~$ cat /run/cloud-init/ds-identify.log
[up 588.07s] ds-identify
policy loaded: mode=search report=false found=all maybe=all notfound=disabled
no datasource_list found, using default: MAAS ConfigDrive NoCloud AltCloud Azure Bigstep CloudSigma CloudStack DigitalOcean AliYun Ec2 GCE OpenNebula OpenStack OVF SmartOS Scaleway Hetzner IBMCloud
DMI_PRODUCT_NAME=HVM domU
DMI_SYS_VENDOR=Xen
DMI_PRODUCT_SERIAL=75995498-da6b-bae6-7b49-2bcf94a51bc1
DMI_PRODUCT_UUID=75995498-da6b-bae6-7b49-2bcf94a51bc1
PID_1_PRODUCT_NAME=unavailable
DMI_CHASSIS_ASSET_TAG=
FS_LABELS=cidata
ISO9660_DEVS=
KERNEL_CMDLINE=BOOT_IMAGE=/boot/vmlinuz-4.19.0-5-amd64 root=UUID=98709c7c-e899-43a8-becd-53adfee9b81a ro quiet
VIRT=xen
UNAME_KERNEL_NAME=Linux
UNAME_KERNEL_RELEASE=4.19.0-5-amd64
UNAME_KERNEL_VERSION=#1 SMP Debian 4.19.37-5+deb10u2 (2019-08-08)
UNAME_MACHINE=x86_64
UNAME_NODENAME=cloudbuster
UNAME_OPERATING_SYSTEM=GNU/Linux
DSNAME=
DSLIST=MAAS ConfigDrive NoCloud AltCloud Azure Bigstep CloudSigma CloudStack DigitalOcean AliYun Ec2 GCE OpenNebula OpenStack OVF SmartOS Scaleway Hetzner IBMCloud
MODE=search
ON_FOUND=all
ON_MAYBE=all
ON_NOTFOUND=disabled
pid=1393 ppid=1392
is_container=false
is_ds_enabled(IBMCloud) = true.
check for 'NoCloud' returned found
ec2 platform is 'Unknown'.
is_ds_enabled(IBMCloud) = true.
Found single datasource: NoCloud
[up 589.18s] returning 0
I had a brief look into the library you are using https://github.com/natevw/fatfs but didn't see a way how the label could be set differently, so I'm not sure if it can be done with this lib, but I might be wrong. Using tried and tested GNU tools like dosfslabel might introduce a package dependency for XO, but it seems like the straight-forward approach (to me, as a sysadmin type of guy...).
Last time I tried on Debian it worked. On which Debian version you are experiencing the issue?
@Fohdeesha can you take a look, it seems we did some changes "recently" (last 6 months) regarding Cloudinit, but I don't remember what do you did
@olivierlambert , thanks for the quick answer.
I used Debian 10.0 Buster.
You are right though, it seems to depend on the distro / version. I just tried your old Debian 8 template, the one you linked to in your blog post, and there the label is correct and it works as expected.
This is weird behaviour though (at least to me), since the block device is created by XO?!?
I can confirm that the distro is the culprit here. When I attach the config drive from the Debian 8 system to my Debian 10 system and check the label, it reports "LABEL_FATBOOT" instead of just "LABEL":
debian@cloudbuster:~$ sudo blkid /dev/xvdc
/dev/xvdc: SEC_TYPE="msdos" LABEL_FATBOOT="cidata" UUID="355A-4FC2" TYPE="vfat"
After a quick search, it seems that util-linux (provides blkid) behaviour has changed. See commit f0ca7e80d7a171701d0d04a3eae22d97f15d0683 in the util-linux changelog.
Another thought, but I might be wrong: If I read the util-linux changelog correctly, LABEL_FATBOOT is used for labels stored in the boot sector of a disk.
I noticed that the disk that is generated by XO does not create a partition (e.g. /dev/xvdb instead of /dev/xvdb1). If you would create a block device with a fat16 partition instead of formatting the disk itself, maybe this would not happen?
Thanks for the various investigations and issues, will follow this :slightly_smiling_face:
@flipsa thanks a lot for the extensive feedback, this is the kind of input that really help us! We'll continue to watch this and try to see what could be the best approach that make it works both with old and current Debian/Ubuntu (or other distro, IDK for others).
I'll run tests on my side at the same time.
Thanks for looking into this guys!
As I am not sure about the disk vs. partition question (see my previous comment), I don't know what needs fixing: XO, fatfs or cloud-init....
However, a very easy and working solution might be to add these 2 lines to their shell script (ds-identify):
debian@cloudbuster:~$ diff /usr/lib/cloud-init/ds-identify.ORIGINAL /usr/lib/cloud-init/ds-identify
236a237,238
> LABEL_FATBOOT=*) label="${line#LABEL_FATBOOT=}";
> labels="${labels}${line#LABEL_FATBOOT=}${delim}";;
In case you think this is reasonable I could send the cloud-init folks a PR, I am just not sure if this use case (label in the boot sector of a disk) is a valid one or not...
Spoke to a cloud-init dev on irc and they requested I open a bug. I'll keep you posted here in case you don't want to follow the bug over there...
Thanks a lot (again!) If it's "fixed" upstream (ie in Cloudinit) that's even better!
Cloud-init dev "rharper" on IRC says that they will fix it, so that ds-identify will detect both fields in the future.
However, they also suggested that you / XO might wanna fix this on your side, too, by writing the "cidata" to both LABEL and LABEL_FATBOOT, to cover all bases...
The reason they think that's a good idea, is because of how long it will take till the fix hits the distros, while at the same time everybody who uses XO + cloud-init on systems with util-linux > 2.33-r1 (currently Debian Buster, Ubuntu Disco, Gentoo, probably more already but i didn't look really) will be affected. I guess I agree with them, but I was also the one who got hit first ;)
If it's trivial on our side, yes, we can do this :) (I'd like to avoid changing/creating a partition etc.)
@olivierlambert could you please have at look at this: https://github.com/natevw/fatfs/issues/30#issuecomment-528129270
Sooo, i'm the only one having to also patch sources/DataSourceNoCloud.py ?
Patching ds-identify allow for correctly selecting NoCloud datasource but then the datasource doesn't detect any data, which isn't very surprising since DataSourceNoCloud also call blkid to find the device path (a bit sad/strange that the information isn't provided by ds-identify
...)
--- DataSourceNoCloud.py.org 2019-12-18 17:36:50.476000000 +0100
+++ DataSourceNoCloud.py 2019-12-18 17:37:46.544000000 +0100
@@ -107,6 +107,7 @@
fslist.extend(util.find_devs_with("TYPE=iso9660"))
label_list = util.find_devs_with("LABEL=%s" % label)
+ label_list.extend( util.find_devs_with("LABEL_FATBOOT=%s" % label))
devlist = list(set(fslist) & set(label_list))
devlist.sort(reverse=True)
edit: fix the patch, add missing parenthesis at the end of line
@bplessis No I don't think so, I recall we had two different customers who could still not get nocloud to pass network config data even after applying the first fix, your explanation would certainly make sense as to why
Is there an update on XOA side for this issue? I was trying to figure out why my templates were noop and finally hit the correct keywords to reach this bug.
I'm not aware of any progress, @Fohdeesha will take a look in Cloudinit upstream if it changed 👍
there hasn't been any movement on the upstream cloud-init issue, it was triaged but never assigned - https://bugs.launchpad.net/cloud-init/+bug/1841466
I may be able to submit a patch, but I doubt it would be backported to stable images. But xcp-ng has to set both metadata as flipsa said in previous comment.
Same issue with Ubuntu 20.04. Are there any changes in progress?
Sooo, i'm the only one having to also patch sources/DataSourceNoCloud.py ?
Patching ds-identify allow for correctly selecting NoCloud datasource but then the datasource doesn't detect any data, which isn't very surprising since DataSourceNoCloud also call blkid to find the device path (a bit sad/strange that the information isn't provided by
ds-identify
...)--- DataSourceNoCloud.py.org 2019-12-18 17:36:50.476000000 +0100 +++ DataSourceNoCloud.py 2019-12-18 17:37:46.544000000 +0100 @@ -107,6 +107,7 @@ fslist.extend(util.find_devs_with("TYPE=iso9660")) label_list = util.find_devs_with("LABEL=%s" % label) + label_list.extend( util.find_devs_with("LABEL_FATBOOT=%s" % label)) devlist = list(set(fslist) & set(label_list)) devlist.sort(reverse=True)
edit: fix the patch, add missing parenthesis at the end of line
I tested your method but didn't work.
The problem is that the cloud-init folks are supposed to fix this, but from what I've seen, nobody has yet volunteered to do it. Everything we do at the customer level (ds-identity, xenochestra [fatfs] settings) will not be officially supported and will be considered a kludge until we have a definitive solution for that. any news? This PR is about to enter the official fatfs code, and consequently the XO.
I suppose as soon fatfs
got the possibility to create a volume label, it would be trivial to fix it (if I understood correctly).
Also, if you have a pro support subscription, creating a support ticket might speed up things regarding our priorities :+1:
I suppose as soon
fatfs
got the possibility to create a volume label, it would be trivial to fix it (if I understood correctly).Also, if you have a pro support subscription, creating a support ticket might speed up things regarding our priorities +1
ok, thanks!
I tested your method but didn't work.
You need my patch AND the ds-identify patch, it was working on debian/buster at least
I tested your method but didn't work.
You need my patch AND the ds-identify patch, it was working on debian/buster at least
ok, it worked for me!
I'm trying to pin this to the cloudinit repository. I have now sent patch patches to cloudinit both to see LABEL and LABEL_FATBOOT in their own scripts, based on tips from @flipsa and @bplessis.
I hope to see this resolved corrected as soon as possible. You can check my update here: https://bugs.launchpad.net/cloud-init/+bug/1841466/comments/5
@marlluslustosa according to https://launchpad.net/cloud-init, official repository is https://github.com/canonical/cloud-init. Maybe a PR to that repository may be more expedited.
According to https://github.com/canonical/cloud-init/pull/513 the issue was fixed upstream on cloud-init. Now it's up to vendors to backport the changes.
Fantastic news! Thanks for the feedback @braiam