linstor-csi icon indicating copy to clipboard operation
linstor-csi copied to clipboard

Is it possible to use linstor-gateway to provide NFS exports?

Open Rid opened this issue 3 years ago • 4 comments

We are looking to find a solution to #69, we're wondering if it's possible to create a deployment of https://github.com/LINBIT/linstor-gateway which will expose NFS exports.

There doesn't appear to be much documentation on how the gateway is working with NFS, and if it provides any advantages over using a https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner deployment (we have an issue with using this deployment as when the NFS pod is rescheduled NFS exports hang on the same node).

The main use-case for using NFS over linstor is to provide load balancing as well as HA.

Rid avatar Jun 07 '22 09:06 Rid

It's complicated.

The issue with linstor-gateway in this context is that it is basically a whole different project, which is mainly aimed at running directly on the host. So we would need to wrap it in a container (might be difficult, it strongly depends on systemd), we would need to wrap it with some CSI driver logic. If at all possible I'd like to avoid all that.

I think it would be much better to find out what exactly does not work with the nfs-ganesha project.

WanzenBug avatar Jul 01 '22 09:07 WanzenBug

@WanzenBug Thanks for considering, I agree that it would be better to find a solution to the nfs-ganesha bug.

I have opened an issue @ https://github.com/nfs-ganesha/nfs-ganesha/issues/825, however at the moment we've hit a brick wall.

If we can create a minimal reproducible scenario, would you be able to independently test it? In order to rule out any issue with our own setup.

Rid avatar Jul 15 '22 13:07 Rid

Please share the reprocuder if you can, but I can't promise when I'll find the time to verify it.

I had a look through the linked ticket. I'm not really familiar with all the details of the NFS protocol, but:

  • Have you tried mounting after the fail-over from a completely new node? That way you could verify that the NFS server is starting as expected.
  • This is something we observed in linstor-gateway, which uses the in-kernel NFS server, so no guarantee this translates to nfs-ganesha: NFS is not a stateless protocol, so in order for the server to fail over it needs also save the connection states. In linstor-gateway this is why there is second volume for NFS exports which I believe is mounted on /var/lib/nfs. I believe the symptoms you are experiencing matches what we saw when that state information is missing. No idea how this is handled in nfs-ganesha though.

WanzenBug avatar Jul 18 '22 07:07 WanzenBug

Unfortunately I've not had time to work on the reproducer, we're instead creating a controller to force delete the NFS pods and any pods mounting them upon node failure.

  • Yes, a new pod is able to mount via the rescheduled NFS pod, only pods with the old NFS mount fail to reconnect.
  • We're using this as the provisioner https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner which stores the configuration in /export which is a piraeus replicated volume, so the state should be persisted.

Rid avatar Aug 17 '22 11:08 Rid

We have narrowed this down to an issue with cilium when using the kube-proxy-replacement. It works as expected without kube-proxy-replacement. https://github.com/cilium/cilium/issues/21541

I will close this as updates will be in the above issue.

Rid avatar Oct 02 '22 13:10 Rid