gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

node-feature-discovery CRDs should be kept in the crds folder

Open dogra-gopal opened this issue 3 years ago • 4 comments

1. Quick Debug Checklist

  • [x] Are you running Kubernetes v1.13+?
  • [x] Did you apply the CRD (kubectl describe clusterpolicies --all-namespaces)

1. Issue or feature description

node-deature-discovery crds are present in: https://github.com/NVIDIA/gpu-operator/blob/master/deployments/gpu-operator/charts/node-feature-discovery/manifests/nodefeaturerule-crd.yaml This causes issue with helm-chart uninstallation, where helm tries to delete CRDs. It will lead to data loss during reinstall.

More info on how crds should be defined here: https://helm.sh/docs/topics/charts/#custom-resource-definitions-crds

dogra-gopal avatar Jul 15 '22 15:07 dogra-gopal

@shivamerla Can you take a look?

dogra-gopal avatar Jul 15 '22 15:07 dogra-gopal

@dogra-gopal thanks for reporting this. Didn't realize NFD chart didn't use crds/ folder. Will check with the NFD team on this.

shivamerla avatar Jul 15 '22 17:07 shivamerla

@dogra-gopal regarding below statement, can you elaborate more?

This causes issue with helm-chart uninstallation, where helm tries to delete CRDs. It will lead to data loss during reinstall.

I know this is not common practice to define CRD's under templates, but i see the benefit that they will be cleaned up after chart is un-installed and also CRD's get updated during helm upgrade. Given helm limitations of handling CRDs(where data-loss is not an issue), this approach is beneficial. I want to know what is the data-loss here when NFD chart itself is deleted (NodeFeatureRule CR or NodeResourceTopology CR?). Is that an issue?

shivamerla avatar Jul 22 '22 18:07 shivamerla

Is NFD CRD's having some useful user/state data, that will get deleted when chart is uninstalled? That could be some potential issue. Also, it could cause some race condition between installed operator and CRDs. The crds folder always gets installed first before the software is brought up to ensure stability, but in this scenario, kubernetes can not ensure CRDs are created before it is potentially used. It should be fine as long as CRDs is not referenced as part of templates.

dogra-gopal avatar Aug 16 '22 15:08 dogra-gopal

Hi, Thanks for the Helm chart.

Please do this change so that the installation of the chart makes CRDs behave as is usual with Helm

remidebette avatar Nov 08 '22 14:11 remidebette