`boot_device` sugar is not supported on s390x
If boot_device.mirror is specified, Butane emits Ignition directives to repartition the entire boot disk(s), create RAID volumes, and create filesystems inside them. During first boot, the OS copies the entire OS contents into memory, does the repartitioning, and copies the contents back to disk. The partition tables created by Butane are hardcoded to match what the OS expects, which is slightly different for each architecture, and so Butane needs to know the CPU architecture via the boot_device.layout field (which defaults to x86_64).
If boot_device.luks is specified, Butane doesn't need to emit directives for repartitioning the entire disk, but it does need to locate the existing root partition so it can create a LUKS volume in it and a new filesystem inside that. (At runtime, the OS still does the copy to RAM and copy back to disk.) To do this, it references the partition by partition label (/dev/disk/by-partlabel/root) so that it doesn't need to know the number of the root partition. Since we're not repartitioning the disk, the layout directive is technically not required when only using boot_device.luks.
If both mirror and luks are specified, Butane does a combination of both.
That all works fine for x86_64, aarch64, and ppc64le, since they all use GPT partition tables. But s390x uses different partition table formats depending on the type of disk. On FBA DASD disks, it uses MBR partition tables, which Ignition doesn't know how to create, and which don't have partition labels. On ECKD DASD disks, it uses the DASD native partitioning format, which Ignition doesn't know how to create and which don't have partition labels (and which only support 3 partitions per disk).
So we could technically have three different layout values, e.g.:
s390x-virt- works like the other archess390x-fba- doesn't supportmirror, supportsluksby hardcoding a partition number (which requires a field for specifying the boot disk, e.g./dev/sda)s390x-eckd- same constraints asfbabut with different hardcoded constants
but that would be confusing.
Ignition and the OS copy-to-RAM/copy-to-disk code should work fine on s390x, it's just that the Butane sugar doesn't know how to configure them. For now, all users on s390x should bypass the boot_device sugar and manually configure LUKS and/or mirroring using the low-level directives, similar to how an encrypted/mirrored data volume would be configured. Do not use boot_device with the default x86_64 layout on s390x, even in VMs where it appears to work, since the x86_64 layout is not guaranteed to remain compatible with the needs of s390x.
On VMs using GPT partition tables this might look like:
variant: fcos
version: 1.5.0
storage:
luks:
- name: root
label: luks-root
device: /dev/disk/by-partlabel/root
wipe_volume: true
clevis:
tang:
- url: http://example.com/
thumbprint: ...
filesystems:
- device: /dev/mapper/root
format: xfs
label: root
wipe_filesystem: true
/dev/disk/by-partlabel/root will only work in VMs using GPT partition tables. Other values will need to be used on DASD disks.
Hi @bgilbert
here is the proposal to add boot_device sugar for s390x.
Presently the testing has been done vda/zfcp/eckd-dasd. We have not done any luks disk encryption testing on zfba-dasd.
So for the device like vda/zfcp/eckd-dasd can be configured with partition number. Like if we use "layout: s390x-zfcp" then it generates the config as below, or if we use "layout: s390x-eckd" or "layout: s390x-virt".
Layout S390x-zfcp
# butane worker-storage.bu -o worker-storage.yaml
Content of worker-storage.bu
variant: openshift
version: 4.13.0
metadata:
name: worker-storage
labels:
machineconfiguration.openshift.io/role: worker
boot_device:
layout: s390x-zfcp
luks:
tang:
- url: http://tang1.example.com:7500
thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
threshold: 1
# butane worker-storage.bu -o test-worker-storage.yaml
# Generated by Butane; do not edit
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: worker-storage
spec:
config:
ignition:
version: 3.2.0
storage:
filesystems:
- device: /dev/mapper/root
format: xfs
label: root
wipeFilesystem: true
luks:
- clevis:
tang:
- thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
url: http://tang1.example.com:7500
threshold: 1
device: /dev/sda4
label: luks-root
name: root
wipeVolume: true
For eckd-dasd disk
Layout S390x-eckd
# butane worker-storage.bu -o worker-storage.yaml
Content of worker-storage.bu
variant: openshift
version: 4.13.0
metadata:
name: worker-storage
labels:
machineconfiguration.openshift.io/role: worker
boot_device:
layout: s390x-eckd
luks:
tang:
- url: http://tang1.example.com:7500
thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
threshold: 1
# butane worker-storage.bu -o test-worker-storage.yaml
# Generated by Butane; do not edit
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: worker-storage
spec:
config:
ignition:
version: 3.2.0
storage:
filesystems:
- device: /dev/mapper/root
format: xfs
label: root
wipeFilesystem: true
luks:
- clevis:
tang:
- thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
url: http://tang1.example.com:7500
threshold: 1
device: /dev/dasda4
label: luks-root
name: root
wipeVolume: true
For virtual disk S390x-virt
Layout S390x-virt
# butane worker-storage.bu -o worker-storage.yaml
Content of worker-storage.bu
variant: openshift
version: 4.13.0
metadata:
name: worker-storage
labels:
machineconfiguration.openshift.io/role: worker
boot_device:
layout: s390x-virt
luks:
tang:
- url: http://tang1.example.com:7500
thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
threshold: 1
# butane worker-storage.bu -o test-worker-storage.yaml
# Generated by Butane; do not edit
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: worker-storage
spec:
config:
ignition:
version: 3.2.0
storage:
filesystems:
- device: /dev/mapper/root
format: xfs
label: root
wipeFilesystem: true
luks:
- clevis:
tang:
- thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
url: http://tang1.example.com:7500
threshold: 1
device: /dev/vda4
label: luks-root
name: root
wipeVolume: true
In the zfcp layout, it looks like you're hardcoding /dev/sda. Is that a safe assumption to make, or should we add a Butane field for specifying the /dev/sda part?
boot_device:
layout: s390x-zfcp
luks:
# only permitted for layouts that use it. should probably be here and not directly
# under boot_device, to avoid a semantic conflict with the device list in the mirror
# section.
device: /dev/sda
tang:
- url: http://tang1.example.com:7500
thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
Similarly for DASD and /dev/dasda. And also, shouldn't we be using partition 2?
For virt setups, we can continue to use /dev/disk/by-partlabel, right? If so, we probably should, to avoid unnecessary hardcoding of partition details.
In the DASD and zfcp cases, we should make sure to fail if a mirror configuration is specified.
I was trying to depict that the butane generates the device as /dev/sda
This is right , /dev/sda for zfcp and /dev/dasda for eckd in boot_device.luks.device as below. For virtual device default boot_device sugar works.
And will make a conditional check that the configuration will generate only with boot_device.luks and it fails if mirror boot_device.mirror configuration is specified for s390x-eckd and s390x-zfcp
zfcp
butane worker-storage.bu -o worker-storage.yaml
Content of worker-storage.bu
variant: openshift
version: 4.13.0
metadata:
name: worker-storage
labels:
machineconfiguration.openshift.io/role: worker
boot_device:
layout: s390x-zfcp
luks:
device: /dev/sda
tang:
- url: http://tang1.example.com:7500
thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
eckd-dasd
variant: openshift
version: 4.13.0
metadata:
name: worker-storage
labels:
machineconfiguration.openshift.io/role: worker
boot_device:
layout: s390x-eckd
luks:
device: /dev/sda
tang:
- url: http://tang1.example.com:7500
thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
So, to be explicit: is it 100% certain that the user will always want /dev/sda and /dev/dasda respectively? Or is it possible that they'll want e.g. /dev/sdb or /dev/dasdb? If it's 100% certain, then we don't need the new device field. But if there's any chance that they'll want a different device, we should add the field.
And to be clear, we'll still need a separate s390x-virt layout even if we're using /dev/disk/by-partlabel. It's not safe to have users default to the x86_64 layout, since there might be arch-specific differences in the future.
An example when user try to create boot_device sugar, so the butane template looks like this.
variant: openshift
version: x.xx.x
metadata:
name: <name>
labels:
machineconfiguration.openshift.io/role: <node-name-string) >
boot_device:
layout: <arch (string)>
luks:
device: <string> #/dev/sda || /dev/dasda Optional but default /dev/sda
tang:
- url: <string>
thumbprint: <string>
/dev/sda is an example when user specifies for zfcp and /dev/dasda eckd. So user must specify layout: s390x-zfcp and device: <string> /dev/sd[a-z] is optional (thanks for the idea _/\_ ). , otherwise it uses the default /dev/sda ?.
Similarly for dasd.
And condition to ensure that s390x-zfcp accept scsi naming convention like sd<a-z] and similarly for s390-eckd.
For zKVM we can still provision layout: s390x-virt which use the same semantic explained for scsi and dasd and default device to /dev/vda unless specifically mention in device: <> ?.
If it is specific like layout: s390x-virt and reduces confusion.
Please directly answer the question I asked in https://github.com/coreos/butane/issues/453#issuecomment-1652349302: is it true or false that the user will sometimes want to install on a disk other than the first one?
If we add a device field, it should be forbidden for layouts that don't support it, and mandatory for layouts that do. Otherwise the user will likely forget to set it.
I think you're right that we should require a /dev/sd prefix for zfcp and /dev/dasd for ECKD. It'll prevent mistakes, and we can always relax that restriction later.
For the s390x-virt layout, as I've been saying, we shouldn't allow the device field and shouldn't hardcode a /dev/vd* device. Instead, we should use /dev/disk/by-partlabel/root as the other arches do. That approach avoids the need to specify any device at all, and in the KVM case there's no reason not to do that.
Is it 100% certain that the user will always want /dev/sda and /dev/dasda respectively? -> False
(I cannot assert the statement True, because it depends on the user requirements ).
I've a question here related to above question. if we use boot_device sugar, it generates ignition with valid device automatically lets say for x86 it is /dev/disk/part-label/root.
However for the s390x device /dev/sd[] needs to be provided manually in that case right? because butane does not know how many disk present in the vm node.
If we add a device field, it should be forbidden for layouts that don't support it, and mandatory for layouts that do. Otherwise the user will likely forget to set it. - Will do that.
Will add s390x-virt layout. something like below.
variant: OpenShift
......
......
boot_device:
layout: s390x-virt
luks:
tang:
url:
so the butane should generate the following output...
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: master
spec:
config:
ignition:
version: 3.2.0
storage:
filesystems:
- device: /dev/mapper/root
format: xfs
label: root
wipeFilesystem: true
luks:
- clevis:
tang:
- thumbprint: QcPr_NHFJammnRCA3fFMVdNBwjs
url: http://12.23.21.58:7500
device: /dev/disk/by-partlabel/root
label: luks-root
name: root
options:
- --cipher
- aes-cbc-essiv:sha256
wipeVolume: true
if we use boot_device sugar, it generates ignition with valid device automatically lets say for x86 it is /dev/disk/part-label/root. However for the s390x device /dev/sd[] needs to be provided manually in that case right? because butane does not know how many disk present in the vm node.
It doesn't have anything to do with the number of disks. On other arches, we can find the existing partition with the label root and reformat it, since CoreOS reserves that label for the root partition. But MBR and DASD partition tables don't support partition labels, or any equivalent functionality, so that trick doesn't work.
Hi @bgilbert
From the above discussion, here are the major rules I captured for s390x. Please let me know if any errors or additional requirements.
-
Add
layoutspecific to s390x . - >s390x-zfcps390x-eckdands390x-virt. -
Add
boot_device.luks.devicespecifically fors390x-zfcps390x-eckdand forbidden for other arch includings390x-virt. -
The configuration will generate only with boot_device.luks and it fails if mirror
boot_device.mirrorconfiguration is specified fors390x-eckdands390x-zfcp. -
Expected Butane sugar from user perspective below.
s390x-eckd
variant: openshift
version: 4.13.0
metadata:
name: worker-storage
labels:
machineconfiguration.openshift.io/role: worker
boot_device:
layout: s390x-eckd
luks:
device: /dev/dasd[a-z]
tang:
- url: http://tang1.example.com:7500
thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
s390x-zfcp
variant: openshift
version: 4.13.0
metadata:
name: worker-storage
labels:
machineconfiguration.openshift.io/role: worker
boot_device:
layout: s390x-zfcp
luks:
device: /dev/sd[a-z]
tang:
- url: http://tang1.example.com:7500
thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
s390x-virt
variant: openshift
version: 4.13.0
metadata:
name: worker-storage
labels:
machineconfiguration.openshift.io/role: worker
boot_device:
layout: s390x-virt
luks:
tang:
- url: http://tang1.example.com:7500
thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
Looks good! Could you also post the expected output for each of those configs?
Note that FCOS should also support these layouts, so the implementation should happen in the fcos experimental spec.
Will implement on FCOS also , in the experimental spec.
Here is the expected output for each configs.
s390x-eckd s390x-zfcp s390x-virt
s390x-eckd
variant: openshift
version: 4.13.0
metadata:
name: worker-storage
labels:
machineconfiguration.openshift.io/role: worker
boot_device:
layout: s390x-eckd
luks:
device: /dev/dasda // given as an example, if we give /dev/dasdb it returns corresponding disk
tang:
- url: http://tang1.example.com:7500
thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
Converting to ignition
# butane s390x-eckd -o s390x-eckd_out.yaml
# Generated by Butane; do not edit
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: worker-storage
spec:
config:
ignition:
version: 3.2.0
storage:
filesystems:
- device: /dev/mapper/root
format: xfs
label: root
wipeFilesystem: true
luks:
- clevis:
tang:
- thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
url: http://tang1.example.com:7500
device: /dev/dasda2 // corresponding disk as per the example. `/dev/dasdb2`, if user use `/dev/dasdb` in device .
label: luks-root
name: root
wipeVolume: true
s390x-zfcp
variant: openshift
version: 4.13.0
metadata:
name: worker-storage
labels:
machineconfiguration.openshift.io/role: worker
boot_device:
layout: s390x-zfcp
luks:
device: /dev/sda // given as an example, if we give /dev/sdb it returns corresponding disk.
tang:
- url: http://tang1.example.com:7500
thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
Converting to ignition
# butane s390x-zfcp.bu -o s390x_zfcp_out.yaml
# Generated by Butane; do not edit
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: worker-storage
spec:
config:
ignition:
version: 3.2.0
storage:
filesystems:
- device: /dev/mapper/root
format: xfs
label: root
wipeFilesystem: true
luks:
- clevis:
tang:
- thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
url: http://tang1.example.com:7500
device: /dev/sda2. // corresponding disk as per the example. `/dev/sdb2` , if user use `/dev/sdb` in device.
label: luks-root
name: root
wipeVolume: true
s390x-virt
device section forbidden for s390x-virt, like other archs.
variant: openshift
version: 4.13.0
metadata:
name: worker-storage
labels:
machineconfiguration.openshift.io/role: worker
boot_device:
layout: s390x-virt
luks:
tang:
- url: http://tang1.example.com:7500
thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
Converting to ignition
# butane s390x-virt.bu -o s390x_virt_out.yaml
# Generated by Butane; do not edit
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: worker-storage
spec:
config:
ignition:
version: 3.2.0
storage:
filesystems:
- device: /dev/mapper/root
format: xfs
label: root
wipeFilesystem: true
luks:
- clevis:
tang:
- thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
url: http://tang1.example.com:7500
device: /dev/disk/by-partlabel/root
label: luks-root
name: root
wipeVolume: true
Looks good. Note that you don't need to implement on FCOS also. If you implement it in the FCOS experimental spec, the OpenShift spec will automatically inherit.