nydus-snapshotter icon indicating copy to clipboard operation
nydus-snapshotter copied to clipboard

Kubelet log has error when use default root dir

Open liubin opened this issue 2 years ago • 18 comments

The log(combined contaienrd and kubelet):

Dec 12 11:34:51 kind-control-plane containerd[2874]: time="2022-12-12T11:34:51.360081705Z" level=debug msg=ImageFsInfo Dec 12 11:34:51 kind-control-plane containerd[2874]: time="2022-12-12T11:34:51.360168006Z" level=debug msg="ImageFsInfo returns filesystem info [&FilesystemUsage{Timestamp:1670844891360135406,FsId:&FilesystemIdentifier{Mountpoint:/var/lib/containerd/io.containerd.snapshotter.v1.nydus,},UsedBytes:&UInt64Value{Value:0,},InodesUsed:&UInt64Value{Value:0,},}]" Dec 12 11:34:51 kind-control-plane kubelet[601]: E1212 11:34:51.360778 601 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="failed to get device for dir "/var/lib/containerd/io.containerd.snapshotter.v1.nydus": stat failed on /var/lib/containerd/io.containerd.snapshotter.v1.nydus with error: no such file or directory" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.nydus"

Continerd returns a fixed-format directory for snapshotters:

func imageFSPath(rootDir, snapshotter string) string {
	return filepath.Join(rootDir, fmt.Sprintf("%s.%s", plugin.SnapshotPlugin, snapshotter))
}

liubin avatar Dec 14 '22 11:12 liubin

Does it mean we must use $containerd_root_dir/io.containerd.snapshotter.v1.nydus as the root dir of nydus snapshotter?

imeoer avatar Dec 15 '22 01:12 imeoer

Does it mean we must use $containerd_root_dir/io.containerd.snapshotter.v1.nydus as the root dir of nydus snapshotter?

Yes, if the root option is set to /var/lib/containerd/io.containerd.snapshotter.v1.nydus, the error logs will disappear.

It seems that the nydus snapshotter's root directory must be under containerd's root to avoid these errors.

But containerd ImageFsInfo interface also has some limitations, for example, it only returns the default global snapshotter's fsinfo, doesnot include any snapshotter plugins fsinfo.

liubin avatar Dec 15 '22 02:12 liubin

Seems the snapshotter root directory is passed to Kubelet for components like cAdvisor to collect statics of POD. The POD eviction based on imageFS does not need it. The disk usage is reported by nydus-snapshotter to containerd.

changweige avatar Dec 15 '22 03:12 changweige

We still need to ensure no error messages are thrown, right? Although maybe it doesn't have any effect.

imeoer avatar Dec 15 '22 09:12 imeoer

I think fixing this on the containerd side will be ideal.

liubin avatar Dec 15 '22 11:12 liubin

I'm trying to use helm install the nydus-snapshotter, I got this error too. I can't find any place to modify the root option, how can i fix it?

Riverdd avatar Feb 21 '23 08:02 Riverdd

I'm trying to use helm install the nydus-snapshotter, I got this error too. I can't find any place to modify the root option, how can i fix it?

Based on what version of nydus-snapshotter you are working on, if it is a self-built nydus-snapshotter from the main branch, it can be tweaked in its configuration file. If it is 0.5 or below of nydus-snapshotter, nydus-snapshotter's root dir can be changed by CLI parameter --root, which can point to /var/lib/containerd/io.containerd.snapshotter.v1.nydus

changweige avatar Feb 21 '23 08:02 changweige

Thanks for your reply, The default value of the tag in the helm chart I use is v0.4.0. According to your reminder, I will try to modify the value NYDUS_LIB in daemonset env. BTW, can the helm chart work on version 0.5.1?

Riverdd avatar Feb 21 '23 10:02 Riverdd

Thanks for your reply, The default value of the tag in the helm chart I use is v0.4.0. According to your reminder, I will try to modify the value NYDUS_LIB in daemonset env. BTW, can the helm chart work on version 0.5.1?

Yes. 0.5.1 can work

changweige avatar Feb 21 '23 10:02 changweige

I rebuilt the image of v0.5.1, modified the root path, and there is no problem in running it, but when creating a pod, the following error will be reported.

Failed to create pod sandbox: rpc error: code = NotFound desc = failed to create containerd container: failed to create snapshot: missing parent "k8s.io/34/sha256:961e93cda9dd918dbe26aca24cccd6c5db05176850d2c53476d881df5d0d4816" bucket: not found

I checked the /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256 path and there is no this file, I guess the metadata is wrong somewhere?

Riverdd avatar Feb 22 '23 07:02 Riverdd

I rebuilt the image of v0.5.1, modified the root path, and there is no problem in running it, but when creating a pod, the following error will be reported.

Failed to create pod sandbox: rpc error: code = NotFound desc = failed to create containerd container: failed to create snapshot: missing parent "k8s.io/34/sha256:961e93cda9dd918dbe26aca24cccd6c5db05176850d2c53476d881df5d0d4816" bucket: not found

I checked the /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256 path and there is no this file, I guess the metadata is wrong somewhere?

Is the image that you are running OCI format or nydus format? Looks like this is an error from Containerd. Can you ensure that containerd is using nydus-snapshotter

changweige avatar Feb 22 '23 07:02 changweige

@Riverdd If you have a boltdb inspector like https://github.com/br0xen/boltbrowser, please try to scan nydus-snapshotter's snapshots DB located at its ROOT/metadata.db and check if it is there. And ctr snapshots should help to analyze this problem

changweige avatar Feb 22 '23 07:02 changweige

I rebuilt the image of v0.5.1, modified the root path, and there is no problem in running it, but when creating a pod, the following error will be reported. Failed to create pod sandbox: rpc error: code = NotFound desc = failed to create containerd container: failed to create snapshot: missing parent "k8s.io/34/sha256:961e93cda9dd918dbe26aca24cccd6c5db05176850d2c53476d881df5d0d4816" bucket: not found I checked the /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256 path and there is no this file, I guess the metadata is wrong somewhere?

Is the image that you are running OCI format or nydus format? Looks like this is an error from Containerd. Can you ensure that containerd is using nydus-snapshotter

I am using the sandbox image in OCI format, can I use the image in OCI format when the containerd snapshotter is configured as nydus?

Riverdd avatar Feb 22 '23 07:02 Riverdd

I rebuilt the image of v0.5.1, modified the root path, and there is no problem in running it, but when creating a pod, the following error will be reported. Failed to create pod sandbox: rpc error: code = NotFound desc = failed to create containerd container: failed to create snapshot: missing parent "k8s.io/34/sha256:961e93cda9dd918dbe26aca24cccd6c5db05176850d2c53476d881df5d0d4816" bucket: not found I checked the /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256 path and there is no this file, I guess the metadata is wrong somewhere?

Is the image that you are running OCI format or nydus format? Looks like this is an error from Containerd. Can you ensure that containerd is using nydus-snapshotter

I am using the sandbox image in OCI format, can I use the image in OCI format when the containerd snapshotter is configured as nydus?

Yes. Nydus-snasphotter is compatible with OCI images, which means OCI image can be unpacked to the snapshots created by nydus-snapshotter.

Aha, I guess the reason of your issue is that the ROOT dir is changed, where stores nydus-snapshotter snapshots DB metadata.db. I suppose you can try to migrate the DB file from the old location to the new locations. Also try to migrate nydus.db db file.

changweige avatar Feb 22 '23 07:02 changweige

I rebuilt the image of v0.5.1, modified the root path, and there is no problem in running it, but when creating a pod, the following error will be reported. Failed to create pod sandbox: rpc error: code = NotFound desc = failed to create containerd container: failed to create snapshot: missing parent "k8s.io/34/sha256:961e93cda9dd918dbe26aca24cccd6c5db05176850d2c53476d881df5d0d4816" bucket: not found I checked the /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256 path and there is no this file, I guess the metadata is wrong somewhere?

Is the image that you are running OCI format or nydus format? Looks like this is an error from Containerd. Can you ensure that containerd is using nydus-snapshotter

I am using the sandbox image in OCI format, can I use the image in OCI format when the containerd snapshotter is configured as nydus?

Yes. Nydus-snasphotter is compatible with OCI images, which means OCI image can be unpacked to the snapshots created by nydus-snapshotter.

Aha, I guess the reason of your issue is that the ROOT dir is changed, where stores nydus-snapshotter snapshots DB metadata.db. I suppose you can try to migrate the DB file from the old location to the new locations. Also try to migrate nydus.db db file.

Looks right, but the old path has been deleted by me.

Riverdd avatar Feb 22 '23 08:02 Riverdd

Hi @Riverdd, is it convenient for you to join DingTalk group 34971767 ?

imeoer avatar Feb 22 '23 08:02 imeoer

34971767

OK, let's change the channel.

Riverdd avatar Feb 22 '23 08:02 Riverdd

Related fixup on containerd: https://github.com/containerd/containerd/pull/10127

imeoer avatar May 06 '24 06:05 imeoer

Related fixup on containerd: containerd/containerd#10127

But we'd better still set snapshotter's default root path to /var/lib/containerd/io.containerd.snapshotter.v1.nydus for the better compatibility.

imeoer avatar Jun 19 '24 01:06 imeoer