linstor-csi
linstor-csi copied to clipboard
Is it possible to use linstor-gateway to provide NFS exports?
We are looking to find a solution to #69, we're wondering if it's possible to create a deployment of https://github.com/LINBIT/linstor-gateway which will expose NFS exports.
There doesn't appear to be much documentation on how the gateway is working with NFS, and if it provides any advantages over using a https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner deployment (we have an issue with using this deployment as when the NFS pod is rescheduled NFS exports hang on the same node).
The main use-case for using NFS over linstor is to provide load balancing as well as HA.
It's complicated.
The issue with linstor-gateway in this context is that it is basically a whole different project, which is mainly aimed at running directly on the host. So we would need to wrap it in a container (might be difficult, it strongly depends on systemd), we would need to wrap it with some CSI driver logic. If at all possible I'd like to avoid all that.
I think it would be much better to find out what exactly does not work with the nfs-ganesha project.
@WanzenBug Thanks for considering, I agree that it would be better to find a solution to the nfs-ganesha bug.
I have opened an issue @ https://github.com/nfs-ganesha/nfs-ganesha/issues/825, however at the moment we've hit a brick wall.
If we can create a minimal reproducible scenario, would you be able to independently test it? In order to rule out any issue with our own setup.
Please share the reprocuder if you can, but I can't promise when I'll find the time to verify it.
I had a look through the linked ticket. I'm not really familiar with all the details of the NFS protocol, but:
- Have you tried mounting after the fail-over from a completely new node? That way you could verify that the NFS server is starting as expected.
- This is something we observed in linstor-gateway, which uses the in-kernel NFS server, so no guarantee this translates to nfs-ganesha: NFS is not a stateless protocol, so in order for the server to fail over it needs also save the connection states. In linstor-gateway this is why there is second volume for NFS exports which I believe is mounted on
/var/lib/nfs. I believe the symptoms you are experiencing matches what we saw when that state information is missing. No idea how this is handled in nfs-ganesha though.
Unfortunately I've not had time to work on the reproducer, we're instead creating a controller to force delete the NFS pods and any pods mounting them upon node failure.
- Yes, a new pod is able to mount via the rescheduled NFS pod, only pods with the old NFS mount fail to reconnect.
- We're using this as the provisioner https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner which stores the configuration in
/exportwhich is a piraeus replicated volume, so the state should be persisted.
We have narrowed this down to an issue with cilium when using the kube-proxy-replacement. It works as expected without kube-proxy-replacement. https://github.com/cilium/cilium/issues/21541
I will close this as updates will be in the above issue.