resource-agents icon indicating copy to clipboard operation
resource-agents copied to clipboard

RA Filesystem does not umount OCFS2 when a VirtualDomain has access

Open e-ferrari opened this issue 5 years ago • 7 comments

Hi, i'm struggling with the Filesystem RA. I realised several times problems. It couldn't umount an OCFS2 Partition, so it failed, and that node got fenced. I had a look into it. I think i found the reason. The script uses "fuser -m /mnt/ocfs2" to find out the processes having access to that mountpoint. I have currently a VirtualDomain running whose raw file resides there.

ha-idg-2:/mnt/share``` # lsof|grep /mnt/ocfs2 qemu-syst 1127 qemu 13ur REG 254,15 171798691840 583946 /mnt/ocfs2/idcc_devel.raw qemu-syst 1127 qemu 14ur REG 254,15 171798691840 583946 /mnt/ocfs2/idcc_devel.rawBut fuser does not show this process:ha-idg-2:/mnt/share # fuser -m /mnt/ocfs2/ ha-idg-2:/mnt/share #` So the script does not get a PID it could kill, so umount is not possible, the RA fails and node get fenced. Is my understanding correct ? Is that a bug or am i missing something ?

System is SLES 12 SP4: ha-idg-2:/mnt/share # rpm -q resource-agents resource-agents-4.3.018.a7fb5035-3.25.1.x86_64

Bernd

e-ferrari avatar Oct 24 '19 16:10 e-ferrari

Sorry for the bad formatting. I will learn. Bernd

e-ferrari avatar Oct 24 '19 17:10 e-ferrari

No worries. You can also edit it and use the Preview tab to see if it's looking as expected.

In this case you probably want the ``` on the line before and the line after your command/output-block.

oalbrigt avatar Oct 25 '19 10:10 oalbrigt

Hi, i'm struggling with the Filesystem RA. I realised several times problems. It couldn't umount an OCFS2 Partition, so it failed, and that node got fenced. I had a look into it. I think i found the reason. The script uses "fuser -m /mnt/ocfs2" to find out the processes having access to that mountpoint. I have currently a VirtualDomain running whose raw file resides there.

ha-idg-2:/mnt/share # lsof|grep /mnt/ocfs2 qemu-syst 1127 qemu 13ur REG 254,15 171798691840 583946 /mnt/ocfs2/idcc_devel.raw qemu-syst 1127 qemu 14ur REG 254,15 171798691840 583946 /mnt/ocfs2/idcc_devel.raw

But fuser does not show this process: ha-idg-2:/mnt/share # fuser -m /mnt/ocfs2/ ha-idg-2:/mnt/share # So the script does not get a PID it could kill, so umount is not possible, the RA fails and node get fenced. Is my understanding correct ? Is that a bug or am i missing something ?

System is SLES 12 SP4: ha-idg-2:/mnt/share # rpm -q resource-agents resource-agents-4.3.018.a7fb5035-3.25.1.x86_64

Bernd

e-ferrari avatar Oct 25 '19 11:10 e-ferrari

That's strange. Sounds like fuser somehow doesnt detect the processes as using files on that specific mount.

You could try setting force_unmount=safe to see if that solves the issue (it finds the processes via /proc instead of using the fuser command).

oalbrigt avatar Oct 25 '19 12:10 oalbrigt

I know for a fact that fuser will not report a defined NFS share on an existing mount, when they are defined. The device will be in use and cannot umount, but you will not see it in fuser.

ezaton avatar Oct 26 '19 18:10 ezaton

That's strange. Sounds like fuser somehow doesnt detect the processes as using files on that specific mount.

You could try setting force_unmount=safe to see if that solves the issue (it finds the processes via /proc instead of using the fuser command).

Hi oalbright, that solved the problem. Thank you !

Bernd

e-ferrari avatar Oct 28 '19 20:10 e-ferrari

Great. Glad to hear it.

oalbrigt avatar Oct 30 '19 15:10 oalbrigt