talos icon indicating copy to clipboard operation
talos copied to clipboard

Talos v1.6.6 - Storage goes invalid after we apply ZFS extension

Open Rammurthy5 opened this issue 1 year ago • 7 comments

Bug Report

Storage on worker nodes go invalid after we apply ZFS extension

Description

Storage on worker nodes go invalid after we apply ZFS extension. This is on AWS ec2 platform installed with Cilium CNI and Kubespan.

Logs

Environment

  • Talos version: [talosctl version --nodes <problematic nodes>] 1.6.6
  • Kubernetes version: [kubectl version --short] 1.28.3
  • Platform: AWS EC2

Rammurthy5 avatar May 29 '24 05:05 Rammurthy5

Please provide a detailed report on what is going on exactly.

smira avatar May 29 '24 09:05 smira

@smira ,

I created a brand new Talos cluster on AWS EC2 with ZFS and iSCSi extensions installed with hugepages, nvme kernel module config as its all mentioned in requirements for Mayastor. it was all good as long as until i added ZFS. As soon as this extension is added to the workers, this 0 storage issue occurs.

Another try with just ZFS and no iSCSi extension or hugepages config. It has openebs-maystor disabled but ZFS pods were running. I was unable to create storages with ZFS as we kept seeing invalid storage issue occured on worker nodes.

brief logs:

"Failed to get the info of the filesystem with mountpoint","mountpoint":"/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs","err":"unable to find data in memory cache"}

"Image garbage collection failed once. Stats initialization may not have completed yet","err":"invalid capacity 0 on image filesystem"}

"error":"PLEG is not healthy: pleg has yet to be successful"}]}

Rammurthy5 avatar May 31 '24 18:05 Rammurthy5

You might need to dig further, ZFS extension itself works (we have integration tests), so there's something going on further down the line. I'm not sure how ZFS affects containerd exactly, or what kind of configuration you're trying to do.

smira avatar May 31 '24 18:05 smira

@smira could i request the steps you have followed to get ZFS fully working please? 🙇🏻 I'd follow the same and see if it helps.

Rammurthy5 avatar May 31 '24 19:05 Rammurthy5

https://github.com/siderolabs/talos/blob/9d395b9de94f28fb9bf56bf795f916f783a847a0/internal/integration/api/extensions_qemu.go#L555-L713

Here is the code from the integration test. ZFS extension is a community project, so it might be that you need to reach out for some community help here.

smira avatar Jun 03 '24 07:06 smira

Hi @smira , can i ask how do we install zpool on talos workers please ? Couldn't find anything on the talos doc. I have extensions, and kernel for ZFS in place already. LocalPV provisioner and zfs controller are running. I need to install zpool and create zpools.

Rammurthy5 avatar Jun 14 '24 10:06 Rammurthy5

ZFS is a community extension, I don't have any specific examples at the moment besides what I posted above.

smira avatar Jun 14 '24 11:06 smira

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Feb 14 '25 02:02 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Feb 19 '25 02:02 github-actions[bot]