node-feature-discovery icon indicating copy to clipboard operation
node-feature-discovery copied to clipboard

Docs: failed to discover kernel config and builtin modules in the development environment started using tilt

Open ChaoyiHuang opened this issue 1 year ago • 4 comments

What happened: In the developer guide , when running with tilt, nfd worker will not be able to discover kernel config and builtin modules

the error message is: nfd-worker │ E0826 07:07:12.558316 1 kernel.go:134] "failed to read kconfig" err="failed to read kernel config from [ /proc/config.gz /host-usr/src/linux-6.5.0-1025-gcp/.config /host-usr/src/linux/.config /host-usr/lib/modules/6.5.0-1025-gcp/config /host-usr/lib/ostree-boot/config-6.5.0-1025-gcp /host-usr/lib/kernel/config-6.5.0-1025-gcp /host-usr/src/linux-headers-6.5.0-1025-gcp/.config /lib/modules/6.5.0-1025-gcp/build/.config /host-boot/config-6.5.0-1025-gcp]"

nfd-worker │ E0826 07:07:12.559161 1 kernel.go:149] "failed to get builtin kernel modules" err="failed to read file /host-lib/modules/6.5.0-1025-gcp/modules.builtin: open /host-lib/modules/6.5.0-1025-gcp/modules.builtin: no such file or directory"

What you expected to happen: no matter what OS is, kernel config and builtin modules should be discovered correctly

How to reproduce it (as minimally and precisely as possible):

  1. start a virtual machine with OS unbuntu 22.04
  2. following the developer guide

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release): ubuntu 22.04
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:

ChaoyiHuang avatar Aug 28 '24 01:08 ChaoyiHuang

/assign @fmuyassarov

marquiz avatar Aug 28 '24 08:08 marquiz

the issue is due to the k3d/kind cluster created by ctlptl will run inside a container(it will serve as the virtual host).

Host folders which will be scaned by the nfd feature discovery should be mounted into the container ( the virtual host). otherwise the nfd-worker container which run inside the virtual host will just see the default base image rootfs /boot, /lib folders, which are usually empty, leads to the discovey failure.

For k3d, the following scripts can pass the host folder to the container, of course, it's better to make these folders read-only.

ctlptl create cluster kind --registry=ctlptl-registry

cat <<EOF | ctlptl apply -f -
apiVersion: ctlptl.dev/v1alpha1
kind: Cluster
registry: ctlptl-registry
product: k3d
k3d:
  v1alpha5Simple:
    volumes:
      - volume: /boot:/boot
        nodeFilters:
          - server:0
          - agent:*
      - volume: /lib:/lib
        nodeFilters:
          - server:0
          - agent:*
EOF

ChaoyiHuang avatar Aug 28 '24 08:08 ChaoyiHuang

Hi @ChaoyiHuang, good that the reason was found. Would it make sense to add some notes to the documentation? If so, would you be willing to contribute such a change?

ping @fmuyassarov

marquiz avatar Sep 18 '24 17:09 marquiz

@marquiz @fmuyassarov my pleasure to update the document, I will submit a PR later.

ChaoyiHuang avatar Sep 23 '24 00:09 ChaoyiHuang

Closing this, as it is fixed now in https://github.com/kubernetes-sigs/node-feature-discovery/pull/1889. Feel free to re-open it otherwise. /close

fmuyassarov avatar Nov 06 '24 10:11 fmuyassarov

@fmuyassarov: Closing this issue.

In response to this:

Closing this, as it is fixed now in https://github.com/kubernetes-sigs/node-feature-discovery/pull/1889. Feel free to re-open it otherwise. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Nov 06 '24 10:11 k8s-ci-robot