gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

Docs: CRI-O runtime hook location

Open heilerich opened this issue 4 years ago • 1 comments

The most recent release of the GPU Operator (v1.6.2) changed the location of the CRI-O runtime hook to /run/containers/oci/hooks.d. After doing a fresh install of the GPU Operator on a vanilla k8s cluster, running the most recent CRI-O v1.20.1 with the default configuration, I had the nvidia-device-plugin pods stuck in a crash loop. It took me some time to figure out, that I had to edit the default CRI-O settings at /etc/crio/crio.conf to include the new (non standard?) location of the runtime hook in order to get it working.

 hooks_dir = [
        "/usr/share/containers/oci/hooks.d",
+        "/run/containers/oci/hooks.d",
 ]

This step wasn't necessary before v1.6.2 and I couldn't find any explanation regarding this in the docs. Shouldn't this be mentioned in the installation manual? Maybe somewhere around here?

Or is there something wrong with my setup and this should work out of the box?

heilerich avatar Mar 21 '21 16:03 heilerich

@heilerich Sorry we missed documenting this with vanilla K8s. With OpenShift 4.5/4.6/4.7 CRI-O config is set by default to include /run/containers/oci/hooks.d directory, but doesn't looks like with other flavors. Will add a note to the docs. Currently its just mentioned in the release-notes: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/release-notes.html#id1

shivamerla avatar Mar 22 '21 15:03 shivamerla

@heilerich with v22.9.2 we have reverted to use default hook location that is configured with CRI-O. Please re-open if you still see any issue with default settings.

shivamerla avatar Apr 05 '23 06:04 shivamerla