longhorn-manager
longhorn-manager copied to clipboard
fix(csi): add node info to failed mount logging.
Which issue(s) this PR fixes:
Issue # https://github.com/longhorn/longhorn/issues/7931
What this PR does / why we need it:
Log with kernel release and OS info when an NFS mount fails, to aid in troubleshooting.
Special notes for your reviewer:
Additional documentation or context
Right, the first draft of this PR used sys.GetKernelRelease() from go-common-libs for the kernel release.
But to get the OS distro was a little trickier. ns.GetOSDistro() does an nsenter using the path /host/proc/ns/1/mnt which exists in a "normal" pod, but not in the longhorn-csi-plugin, so the call fails. There the right path would be /rootfs/proc/ns/1/mnt. So we can't use the ns pkg.
But sys.GetOSDistro(osReleaseContent string) isn't really directly usable. It is just a utiilty method for ns.GetOsDistro() which feeds it the contents of the /etc/os-release file to parse.
I figured I would just use the result from uname -v which would be something like
#74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023.
and so I borrowed the parsing code from sys.GetKernelRelease and did the equivalent of uname -rv
If I read the contents of the local /etc/os-release without doing the namespace stuff, and use that to call sys.GetOSDistro(), I'll get the info for the pod itself, which is
longhorn-csi-plugin-bchgv:/ # cat /etc/os-release
NAME="SLES"
VERSION="15-SP5"
VERSION_ID="15.5"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP5"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp5"
DOCUMENTATION_URL="https://documentation.suse.com/"
which is clearly wrong, so I'll need to do an equivalent nsenter but to the /rootfs/proc path to get this for the host:
root@jbm-u22-pool2-ca5a5aa4-25fzl:~# cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
BTW, GetOSDistro returns the value of the ID= tag, but for purposes of this logging, I would think that PRETTY_NAME would be more informative.
Revised to use the go-common-libs/sys utilities. I had to make a local copy of RunFunc to use the CSI plugin's different location for /proc and some slightly modified flavors of the common-libs/ns utilities.
One other option: Mount the host's / on /host for the CSI deployment(s). The path /rootfs is set in this code:
https://github.com/longhorn/longhorn-manager/blame/103bcaaae5e080dc93af66bf0f8355c31d2f1b0e/csi/deployment.go#L417
and I'm not sure whether the comment is asserting that this specific path is required. It would be handy if it could follow the same pattern as other host root mounts.
It appears that it is, though, looking at, say, https://github.com/kubernetes/utils/blob/4693a0247e57ebeacb9f261f07a03c9ecd569f44/nsenter/nsenter.go#L37 I'm not quite sure of the scope of that requirement.
One other option: Mount the host's
/on/hostfor the CSI deployment(s). The path/rootfsis set in this code:103bcaa/csi/deployment.go#L417 (blame) and I'm not sure whether the comment is asserting that this specific path is required. It would be handy if it could follow the same pattern as other host root mounts.
I think we can change to /host directly, since /rootfs was introduced in the older RWX implementation? You can run RWX related test cases to see if there will be any regression.
That's right. Manual and regression testing are happy with the change to mount host's / on /host. I have pushed up the change.
@mergify backport v1.6.x v1.5.x
backport v1.6.x v1.5.x
✅ Backports have been created
- #2713 fix(csi): add node info to failed mount logging. (backport #2686) has been created for branch
v1.6.x - #2714 fix(csi): add node info to failed mount logging. (backport #2686) has been created for branch
v1.5.x