lvm-localpv icon indicating copy to clipboard operation
lvm-localpv copied to clipboard

fails to format

Open davidkarlsen opened this issue 2 years ago • 23 comments

What steps did you take and what happened:

   ----              ----               -------
  Warning  FailedScheduling  6m2s              default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: selectedNode annotation reset for PVC "elasticsearch-elasticsearch-cdm-4qo1qel7-1"
  Normal   Scheduled         16s               default-scheduler  Successfully assigned openshift-logging/elasticsearch-cdm-4qo1qel7-1-6db94d4d88-lwtv7 to alp-dts-g-c01oco09
  Warning  FailedMount       5s (x5 over 13s)  kubelet            MountVolume.SetUp failed for volume "pvc-c9073859-fd54-4890-b444-b96e6f46dea1" : rpc error: code = Internal desc = failed to format and mount the volume error: mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-c9073859-fd54-4890-b444-b96e6f46dea1 /var/lib/kubelet/pods/0c34d38c-88f0-4a1c-bf6f-02e6b3ab05cd/volumes/kubernetes.io~csi/pvc-c9073859-fd54-4890-b444-b96e6f46dea1/mount
Output: mount: /var/lib/kubelet/pods/0c34d38c-88f0-4a1c-bf6f-02e6b3ab05cd/volumes/kubernetes.io~csi/pvc-c9073859-fd54-4890-b444-b96e6f46dea1/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--c9073859--fd54--4890--b444--b96e6f46dea1, missing codepage or helper program, or other error.

beacause:

 mkfs.xfs /dev/datavg/pvc-c9073859-fd54-4890-b444-b96e6f46dea1
mkfs.xfs: /dev/datavg/pvc-c9073859-fd54-4890-b444-b96e6f46dea1 appears to contain an existing filesystem (xfs).
mkfs.xfs: Use the -f option to force overwrite.

Maybe it should force by default or some notes added to the docs.

What did you expect to happen: formatting should happen

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • LVM Driver version
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

davidkarlsen avatar Aug 23 '21 11:08 davidkarlsen

Hi @davidkarlsen Can you please tell us the environment details like LVM-driver version, k8s version and OS ?

w3aman avatar Aug 23 '21 12:08 w3aman

Hi @davidkarlsen Can you please tell us the environment details like LVM-driver version, k8s version and OS ?

lvm version
  LVM version:     2.02.187(2)-RHEL7 (2020-03-24)
  Library version: 1.02.170-RHEL7 (2020-03-24)
  Driver version:  4.37.1
  Configuration:   ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm --with-default-pid-dir=/run --with-default-locking-dir=/run/lock/lvm --with-usrlibdir=/usr/lib64 --enable-lvm1_fallback --enable-fsadm --with-pool=internal --enable-write_install --with-user= --with-group= --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --enable-pkgconfig --enable-applib --enable-cmdlib --enable-dmeventd --enable-blkid_wiping --enable-python2-bindings --with-cluster=internal --with-clvmd=corosync --enable-cmirrord --with-udevdir=/usr/lib/udev/rules.d --enable-udev_sync --with-thin=internal --enable-lvmetad --with-cache=internal --enable-lvmpolld --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-dmfilemapd
uname -a
Linux alp-dts-g-c01oco07 3.10.0-1160.36.2.el7.x86_64 #1 SMP Thu Jul 8 02:53:40 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.9 (Maipo)

openebs helm chart 2.12.0

davidkarlsen avatar Aug 23 '21 12:08 davidkarlsen

@davidkarlsen -- this looks related to #75.

Would it be possible to try this on RHEL 8?

kmova avatar Aug 24 '21 07:08 kmova

It looks the same. Unfortunately I can't run on RHEL8 as it's not supported for OCP . In my cases I had just deleted some LVs, then created a new one, which probably landed at the same offset, so w/o clearing the old volume it will probably find a magic superblock and avoid formatting w/o using force - so in order to compare rhel7/8 that should be the underlying setting.

davidkarlsen avatar Aug 24 '21 07:08 davidkarlsen

Can wiping and zeroing be controlled when the volumes are created? I'd recommend having both enabled by default.

davidkarlsen avatar Aug 24 '21 18:08 davidkarlsen

@davidkarlsen that was the plannd item for LVM LocalPV. We already wipe the lvm partition when we delete the volume. From the error it looks like you already had some partition before and volume landed at the same offset. We need to clear the fs at the creation time also. We had planned this and somehow missed implementing it. Will take care of adding this enhancement.

pawanpraka1 avatar Aug 25 '21 07:08 pawanpraka1

Note that the safest is to do wipe at create too.

davidkarlsen avatar Aug 25 '21 08:08 davidkarlsen

@davidkarlsen I have raised a PR(https://github.com/openebs/lvm-localpv/pull/138) to fix it. Can you try with the image pawanpraka1/lvm-driver:vp and see if it is working.

pawanpraka1 avatar Aug 25 '21 12:08 pawanpraka1

@davidkarlsen can you confirm the lvm driver version you are using? It should be there in beginning of the openebs-lvm-plugin container log in the openebs-lvm-node-xxxx daemonset.

pawanpraka1 avatar Aug 26 '21 08:08 pawanpraka1

This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:

  • Create a volume(PVC) with ext4 fs and launched pod.
  • Delete a pod & volume(PVC)
  • Create a volume with XFS fs and launched pod... then the issue will be reproducible. Note: If we create volume again with the same fs as previous one then the application is able to access it.

mittachaitu avatar Aug 26 '21 09:08 mittachaitu

@davidkarlsen can you confirm the lvm driver version you are using? It should be there in beginning of the openebs-lvm-plugin container log in the openebs-lvm-node-xxxx daemonset.

LVM Driver Version :- 0.8.0 - commit :- 929ae4439f2da71a2d6ee5bda6a33dd2f7d424fc

davidkarlsen avatar Aug 26 '21 12:08 davidkarlsen

This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:

Hmm, then howcome I experience this problem with 0.8.0? BTW, when you format, do you pass the -f (force) option?

davidkarlsen avatar Aug 29 '21 20:08 davidkarlsen

This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:

Hmm, then howcome I experience this problem with 0.8.0? BTW, when you format, do you pass the -f (force) option?

Yes, we are passing -f(force) option from 0.6.0 version onwards

mittachaitu avatar Aug 30 '21 04:08 mittachaitu

This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:

Hmm, then howcome I experience this problem with 0.8.0? BTW, when you format, do you pass the -f (force) option?

Yes, we are passing -f(force) option from 0.6.0 version onwards

Then it's a bit surprising to meet this in the current release for two reasons:

  1. If volumes are wiped at creation the superblock should be wiped in the first place and the bug should not surface
  2. If force formatting it should ignore it and pass anyway

I'll try to provoke this in a third cluster when I have time.

davidkarlsen avatar Aug 30 '21 23:08 davidkarlsen

Tried now with 2.12.2 chart, still same:

                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  3m20s               default-scheduler  0/18 nodes are available: 3 Insufficient memory, 3 node(s) had taint {fsapplog: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity/selector, 5 node(s) had taint {fss.tietoevry.com/finods-group: }, that the pod didn't tolerate.
  Warning  FailedScheduling  3m18s               default-scheduler  0/18 nodes are available: 3 Insufficient memory, 3 node(s) had taint {fsapplog: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity/selector, 5 node(s) had taint {fss.tietoevry.com/finods-group: }, that the pod didn't tolerate.
  Normal   Scheduled         3m4s                default-scheduler  Successfully assigned openshift-logging/elasticsearch-cdm-cqg8zvqd-1-5596fc5479-7lmtg to alp-ksx-c01oco05
  Warning  FailedMount       62s                 kubelet            Unable to attach or mount volumes: unmounted volumes=[elasticsearch-storage], unattached volumes=[kube-api-access-29pgd elasticsearch-metrics elasticsearch-storage elasticsearch-config certificates]: timed out waiting for the condition
  Warning  FailedMount       57s (x9 over 3m5s)  kubelet            MountVolume.SetUp failed for volume "pvc-5128b42c-a7c1-403b-b599-2cadf8984328" : rpc error: code = Internal desc = failed to format and mount the volume error: mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-5128b42c-a7c1-403b-b599-2cadf8984328 /var/lib/kubelet/pods/b7c50bae-72a1-4ae5-9c0d-e23b8e84a5b3/volumes/kubernetes.io~csi/pvc-5128b42c-a7c1-403b-b599-2cadf8984328/mount
Output: mount: /var/lib/kubelet/pods/b7c50bae-72a1-4ae5-9c0d-e23b8e84a5b3/volumes/kubernetes.io~csi/pvc-5128b42c-a7c1-403b-b599-2cadf8984328/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--5128b42c--a7c1--403b--b599--2cadf8984328, missing codepage or helper program, or other error.

davidkarlsen avatar Sep 03 '21 19:09 davidkarlsen

same problem on 2.12.5

davidkarlsen avatar Sep 09 '21 20:09 davidkarlsen

From the logs:

I0909 20:35:40.848768       1 grpc.go:72] GRPC call: /csi.v1.Node/NodePublishVolume requests {"target_path":"/var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount","volume_capa
bility":{"AccessType":{"Mount":{"fs_type":"xfs"}},"access_mode":{"mode":1}},"volume_context":{"csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"prometheus-k8s-1","csi.storage.k8s.io/pod.namespace":"openshift-monitoring","csi.storage.k
8s.io/pod.uid":"179a5e86-43a5-43f7-b78e-b11af4368674","csi.storage.k8s.io/serviceAccount.name":"prometheus-k8s","openebs.io/cas-type":"localpv-lvm","openebs.io/volgroup":"datavg","storage.kubernetes.io/csiProvisionerIdentity":"1631215660348-8081-local.cs
i.openebs.io"},"volume_id":"pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937"}
I0909 20:35:40.864001       1 mount_linux.go:366] Disk "/dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937" appears to be unformatted, attempting to format as type: "xfs" with options: [/dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937]
I0909 20:35:41.646181       1 mount_linux.go:376] Disk successfully formatted (mkfs): xfs - /dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937 /var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount
E0909 20:35:41.648622       1 mount_linux.go:150] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937 /var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount
Output: mount: /var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--d5be05a4--f5f8--4b7e--83b3--b53eaaff8937, missing codepage or helper program, or other error.

note that there is no -f in: attempting to format as type: "xfs" with options: [/dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937]

davidkarlsen avatar Sep 09 '21 20:09 davidkarlsen

The issue lies here: https://github.com/kubernetes/mount-utils/pull/5

davidkarlsen avatar Sep 09 '21 20:09 davidkarlsen

https://github.com/kubernetes/kubernetes/pull/104923

davidkarlsen avatar Sep 10 '21 22:09 davidkarlsen

Looks like even with above force flag issue is the still the same... When this issue occurred following are the system logs:

Sep 12 18:51:53 centos-master kernel: XFS (dm-0): Superblock has unknown read-only compatible features (0x4) enabled.
Sep 12 18:51:53 centos-master kernel: XFS (dm-0): Attempted to mount read-only compatible filesystem read-write.
Sep 12 18:51:53 centos-master kernel: XFS (dm-0): Filesystem can only be safely mounted read only.
Sep 12 18:51:53 centos-master kernel: XFS (dm-0): SB validate failed with error -22.

Above error -22 leads to EINVAL which means Invalid Argument(As I understood kernel is not yet supporting the same)... and did some googling around above error takes me to this page.

mkfs.xfs version on centos 7: 4.5.0 mkfs.xfs version on container: 5.6.0 Looks like some incompatibility as mentioned in issue...

To resolve issue we have to format xfs filesystem with following option: 'mkfs.xfs -m reflink=0 /dev/lvm/manual1'

Attepmt1 formated with xfs without using any flags:

bash-5.0# lvcreate -n manual1 -L 1G lvm
  Logical volume "manual1" created.
bash-5.0# mkfs.xfs /dev/lvm/manual1 
meta-data=/dev/lvm/manual1       isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
bash-5.0# mount /dev/lvm/manual1 /var/lib/kubelet/mnt/store1
mount: /var/lib/kubelet/mnt/store1: wrong fs type, bad option, bad superblock on /dev/mapper/lvm-manual1, missing codepage or helper program, or other error.

Attepmt2 formated with xfs using -m reflink=0 flag:

bash-5.0# lsblk -fa
NAME          FSTYPE      FSVER LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINT
fd0                                                                                         
loop0         squashfs                                                                      
loop1         squashfs                                                                      
loop2         squashfs                                                                      
sda                                                                                         
├─sda1        xfs                     8808cf9e-0900-4d7a-af19-36bf061d7a24                  
└─sda2        xfs                     72d0dc49-d80f-4aa8-a51f-51e237deb23e     10.9G    62% /var/lib/kubelet
sdb           LVM2_member             IvJ3Z4-PaLm-zZ5j-4oxK-H6dS-pkBk-KjcJSG                
└─lvm-manual1                                                                               
sr0                                                                                         
bash-5.0# lvs
  LV      VG  Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  manual1 lvm -wi-a----- 1.00g                                                    
bash-5.0# mkfs.xfs -m reflink=0 /dev/lvm/manual1 
meta-data=/dev/lvm/manual1       isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
bash-5.0# mount /dev/lvm/manual1 /var/lib/kubelet/mnt/store1
bash-5.0# 
bash-5.0# df -h
Filesystem               Size  Used Avail Use% Mounted on
overlay                   29G   19G   11G  63% /
tmpfs                    1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/sda2                 29G   19G   11G  63% /plugin
devtmpfs                 1.9G     0  1.9G   0% /dev
shm                       64M     0   64M   0% /dev/shm
tmpfs                    1.9G   12K  1.9G   1% /var/lib/kubelet/pods/32966bd7-fd41-4f49-b572-8a25a1dc802d/volumes/kubernetes.io~secret/kube-proxy-token-tmnfs
tmpfs                    1.9G   12K  1.9G   1% /var/lib/kubelet/pods/8e82a39d-d592-4051-83f2-bb372f568246/volumes/kubernetes.io~secret/flannel-token-fpwlc
tmpfs                    1.9G   12K  1.9G   1% /var/lib/kubelet/pods/be4ddd06-bed9-4d18-bb54-26e67c77eb74/volumes/kubernetes.io~secret/openebs-maya-operator-token-sj7w5
tmpfs                    1.9G   12K  1.9G   1% /run/secrets/kubernetes.io/serviceaccount
---------------------------------------------------------------------------------------------------------------------
| /dev/mapper/lvm-manual1 1014M   33M  982M   4% /var/lib/kubelet/mnt/store1              |
---------------------------------------------------------------------------------------------------------------------
bash-5.0# 
  • Able to mount when used -m reflink=0 flag which say to xfs fs to disables shared copy-on-write feature which is not supported by centos-7(AFAIK)

Redhat document which says to pass reflink option

mittachaitu avatar Sep 12 '21 14:09 mittachaitu

@mittachaitu I believe that's another problem (it has another error-message) - please create a separate issue for that.

davidkarlsen avatar Sep 12 '21 15:09 davidkarlsen

mount: /var/lib/kubelet/mnt/store1: wrong fs type, bad option, bad superblock on /dev/mapper/lvm-manual1, missing codepage or helper program, or other error.

Above is the error I got when tried to mount xfs formatted lvm volume and the issue description is also having a simmilar error.. So I belive both are the same...

Mounting command: mount Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-c9073859-fd54-4890-b444-b96e6f46dea1 /var/lib/kubelet/pods/0c34d38c-88f0-4a1c-bf6f-02e6b3ab05cd/volumes/kubernetes.io~csi/pvc-c9073859-fd54-4890-b444-b96e6f46dea1/mount Output: mount: /var/lib/kubelet/pods/0c34d38c-88f0-4a1c-bf6f-02e6b3ab05cd/volumes/kubernetes.io~csi/pvc-c9073859-fd54-4890-b444-b96e6f46dea1/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--c9073859--fd54--4890--b444--b96e6f46dea1, missing codepage or helper program, or other error.

The above is from issue description

mittachaitu avatar Sep 13 '21 05:09 mittachaitu

@w3aman could you maybe by any chance pull in my hack on mount_utils? Merging into Kubernetes and waiting for a release will take forever.

davidkarlsen avatar Sep 14 '21 12:09 davidkarlsen

A reasonable update at the moment is to mention in our documentation that a combination of xfs and older kernel (< 5.10) may run into this issue and can be mitigated by updated host node kernel version.

dsharma-dc avatar Jun 05 '24 07:06 dsharma-dc

Documented in PR #448 & PR #451

balaharish7 avatar Jun 14 '24 08:06 balaharish7