zfs-localpv icon indicating copy to clipboard operation
zfs-localpv copied to clipboard

[Clone from Snapshot] CloneCommand never send

Open PrivatePuffin opened this issue 1 year ago • 37 comments

What steps did you take and what happened: When creating a pvc as a clone from a snapshot of another PVC, the clone command never gets send

ZFS command log:

2024-05-19.17:46:29 zfs create -o quota=274877906944 -o recordsize=64k -o dedup=off -o compression=zstd-6 -o encryption=off -o mountpoint=legacy NVME/app-pvc/pvc-b44a3c8a-4cae-4d63-aed1-fafe853f8e8b
2024-05-19.17:46:29 zfs create -o quota=1073741824 -o recordsize=64k -o dedup=off -o compression=zstd-6 -o encryption=off -o mountpoint=legacy NVME/app-pvc/pvc-176eff42-3b84-4ce6-8fe9-a06192320869
2024-05-19.17:46:52 zfs snapshot NVME/app-pvc/pvc-b44a3c8a-4cae-4d63-aed1-fafe853f8e8b@snapshot-08240c13-7fcb-4425-838d-79eacfc27407
2024-05-19.18:57:04 zfs clone NVME/app-pvc/pvc-b44a3c8a-4cae-4d63-aed1-fafe853f8e8b@snapshot-08240c13-7fcb-4425-838d-79eacfc27407 NVME/app-pvc/testclone

Notice "testclone" which was ran manually, the log never showed any attempt at running the clone command, successfull or otherwise.

What did you expect to happen:

It should at least send the clone command, failure or success regardless.

The output of the following commands will help us better understand what's going on:

  • All OpenZFS pods and containers are running, healthy, latest and able to create normal PVC defined datasets.
  • zfsvolumes cr for origin PVC is present
  • zfssnapshot cr for originPVC is present
  • zfs snapshot itself for origin PVC is present
  • zfs snapshot itself for origin PVC can be cloned manually
  • zfsvolume for targetPVC is present
  • dataset for targetPVC is missing
  • clonecommand never send

zfsvolume cr for targetPVC

Name:         pvc-095889bb-2da5-4b40-838e-d54c499b2ab0
Annotations:  <none>
API Version:  zfs.openebs.io/v1
Kind:         ZFSVolume
Spec:
  Capacity:        274877906944
  Compression:     zstd-6
  Dedup:           off
  Encryption:      off
  Fs Type:         zfs
  Owner Node ID:   ix-truenas
  Pool Name:       NVME/app-pvc
  Recordsize:      64k
  Shared:          yes
  Snapname:        pvc-b44a3c8a-4cae-4d63-aed1-fafe853f8e8b@snapshot-08240c13-7fcb-4425-838d-79eacfc27407
  Thin Provision:  yes
  Volume Type:     DATASET
Status:
  State:  Failed
Events:   <none>

volume for origin is all correct. target and origin share the same (known-good) storageClass

  • kubectl logs -f openebs-zfs-controller-f78f7467c-blr7q -n openebs -c openebs-zfs-plugin Gist logs are to be added, but highlights. But error is just failing on the Dataset existence check, after creation of the ZFSvolume CR and no descriptive error is output in any of them. Just the fact volume creation failed, which we can also already see on the zfsvolume cr.

  • kubectl logs -f openebs-zfs-node-[xxxx] -n openebs -c openebs-zfs-plugin https://gist.github.com/Ornias1993/9cb23dc0df026233e8d64c74d70bd39a https://gist.github.com/Ornias1993/991e63fbc41fd68e71a35c2f8f2e3a62

Anything else you would like to add:

Everything works perfectly fine and the PVC is also created fine. Other PVC creation works fine. All OpenEBS pods and components are fine volumesnapshot objects are present and fine, as-well-as ZFS snapshots.

But regardless of this, at least the clone command should've been send.

Environment:

  • Verified Latest
  • Verified Latest
  • Platform independent (tested multiple)
  • Debian/TrueNAS/and-others

PrivatePuffin avatar May 19 '24 17:05 PrivatePuffin

I've also noticed that the clone feature support is not tested in any way either, or at least I don't see it being ran.

That being said, with the lack of logging it's REALLY hard to bugtrace.

PrivatePuffin avatar May 19 '24 17:05 PrivatePuffin

I've also noticed that the clone feature support is not tested in any way either, or at least I don't see it being ran.

I took a few notes the other day of what should be done about lacking test for the local engines. I'll write it up on a ticket so we can track progress on addressing it. Some e2e-test infra is also lost since archival, so that will have to be replaced somehow.

tiagolobocastro avatar May 20 '24 10:05 tiagolobocastro

I took a few notes the other day of what should be done about lacking test for the local engines.

Ahh good to soo activity on that!

I'll write it up on a ticket so we can track progress on addressing it.

Yes that would be great as well!

Some e2e-test infra is also lost since archival, so that will have to be replaced somehow.

I noticed, I think it might be best to fix the tests first and go from there? If this issue is correct, it would show itself when tests get implemented anyway...


Edit/Addition

On another note is the lack of verbosity though... Lot of errors come down to "no worky", it would be nice to at least more easily trace it back to the step in the provisioning/creation process thats failing.

PrivatePuffin avatar May 20 '24 10:05 PrivatePuffin

Yes that would be great as well!

I've raised https://github.com/openebs/openebs/issues/3726 Hopefully more folks will pitch in, correct anything I've said wrong and provide more details.

I noticed, I think it might be best to fix the tests first and go from there? If this issue is correct, it would show itself when tests get implemented anyway...

Yes that makes sense. I'd also like to see how much of these local engines (lvm, zfs...) test we can "unify". Basic CSI etc functionality should work exactly the same, so presumably the same basic tests should apply to all?

On another note is the lack of verbosity though... Lot of errors come down to "no worky", it would be nice to at least more easily trace it back to the step in the provisioning/creation process thats failing.

So you're saying even if you increase verbosity there's nothing else being added? Maybe we can track this as an enhancement too. CC @sinhaashish

EDIT: sorry mouse slip...

tiagolobocastro avatar May 20 '24 12:05 tiagolobocastro

On another note is the lack of verbosity though... Lot of errors come down to "no worky", it would be nice to at least more easily trace it back to the step in the provisioning/creation process thats failing.

So you're saying even if you increase verbosity there's nothing else being added? Maybe we can track this as an enhancement too. CC @sinhaashish

EDIT: sorry mouse slip...

Is there even a verbosity option?

PrivatePuffin avatar May 20 '24 13:05 PrivatePuffin

Doubt verbosity would've helped much in this case btw, as I manually tracked the log messages through the code and it never even hit the function that should've send the zfs-clone command.

PrivatePuffin avatar May 20 '24 13:05 PrivatePuffin

Is there even a verbosity option?

sigh, sorry my mistake for making this assumption, seems there's not..

tiagolobocastro avatar May 20 '24 13:05 tiagolobocastro

@jnels124 Can you confirm on here that you also see this behavior as well?

PrivatePuffin avatar May 26 '24 19:05 PrivatePuffin

Hey @Ornias1993, I tried to reproduce this issue but it seems it is working for me. I tried on Debian 12, k8s 1.26.15, with zfs-2.1.11-1. I am able to successfully created clone out of a snapshot which was created on a thinProvisioned, and shared mount volume.

Can i get more specific details so that i can see how to reproduce this issue (what exact steps you performed, with yaml if possible)?

Logs doesn't seems to be useful here, i just see bunch of repeating following lines for different PVCs Started PVC processing "ix-authelia/authelia-cnpg-main-1-wal No need to resize PVC "ix-wg-easy/wg-easy-config"

w3aman avatar May 29 '24 11:05 w3aman

Logs doesn't seems to be useful here

I've not added all logs yet, but the issue is also that there is no verbosity at all. So the logs are, sorry to say, completely useless crap.

Those "this no worky" errors are not useable log output anyway.

i just see bunch of repeating following lines for different PVCs

Obviously this is a running system so the logs do contain things that aren't related.

Can i get more specific details so that i can see how to reproduce this issue

Primarily the use is through VolSync, so I cannot provide YAML for you. However, I verified everything on VolSync is green, it's just waiting on the clone-from-snapshot to be created by OpenEBS. Which, obviously, never happens as it errors out

However, everything works perfectly fine, except that the PVC volsync creates throws a vague error and OpenEBS never executes the clone command. But due to the insane lack of any way of bugtracing it (because there is no verbosity anywhere), I've no way of even trying to fix it myself.

PrivatePuffin avatar May 29 '24 11:05 PrivatePuffin

@w3aman Can you share your way of testing this in yaml format?

PrivatePuffin avatar May 29 '24 11:05 PrivatePuffin

More logs: https://gist.github.com/Ornias1993/2a9f9b32d82b5720a3ae3917486c48ea

https://gist.github.com/Ornias1993/903229b76a44da17208753519e9b755c

PrivatePuffin avatar May 29 '24 12:05 PrivatePuffin

@w3aman Can you share your way of testing this in yaml format?

I tried to replicated spec from your zfsvolume and these are my yamls (zfs_yaml.txt) and they work perfectly fine.

here is my zfsvolume for target pvc:

Name:         pvc-ef52066b-fd78-42c6-ab1f-9a08d1e94da1
Namespace:    openebs
Labels:       kubernetes.io/nodename=debian-worker
Annotations:  <none>
API Version:  zfs.openebs.io/v1
Kind:         ZFSVolume
Metadata:
  Creation Timestamp:  2024-05-29T11:46:47Z
  Finalizers:
    zfs.openebs.io/finalizer
  Generation:        2
  Resource Version:  14262
  UID:               8d15a359-1a88-440c-9c06-bd43e771ef54
Spec:
  Capacity:        2147483648
  Compression:     zstd-6
  Dedup:           off
  Fs Type:         zfs
  Owner Node ID:   debian-worker
  Pool Name:       NVME/app-pvc
  Recordsize:      64k
  Shared:          yes
  Snapname:        pvc-b56509a2-2d22-4419-b6e1-78084883aaa6@snapshot-8c0db664-5cb6-4e9f-93b1-32ff009cfdfb
  Thin Provision:  yes
  Volume Type:     DATASET
Status:
  State:  Ready
Events:   <none>```


w3aman avatar May 29 '24 13:05 w3aman

Can you send me the output for this command for the node agent pod on the node ix-truenas?

kubectl logs openebs-zfs-node-[xxxx] -n openebs -c openebs-zfs-plugin | grep pvc-095889bb-2da5-4b40-838e-d54c499b2ab0

w3aman avatar May 29 '24 13:05 w3aman

I already uploaded those logs: https://gist.github.com/Ornias1993/991e63fbc41fd68e71a35c2f8f2e3a62

Its not my personal system, I went through a troubleshooting session with one of our users.

PrivatePuffin avatar May 29 '24 15:05 PrivatePuffin

@Ornias1993 I do see this behavior as well. However, I am creating the snapshots with different tooling. We deploy citusdb (postgrsql extension) via stackgres. Stackgres provides the snapshotting ability.

  1. We create the snapshots in stackgres... This creates The appropriate k8s volumesnapshot resource which in turn creates the appropriate zfssnapshot resource. When restoring, the correct zfsvolume definition is created and references the snapshot. Give me some time to get some of the old logs together and provide replication steps

jnels124 avatar May 31 '24 19:05 jnels124

@Ornias1993 are you setting the openebs.io/nodeid label to something other than the node name?

jnels124 avatar May 31 '24 20:05 jnels124

@Abhinandan-Purkait

In my case, I believe the issue is caused by the opens.io/nodeid being different than the nodename.

The issue appears to be with driver/controller.go and the CreateVolume method

This block is problematic. Because the value of selected in the first two blocks comes from the resource spec OwnerNodeId and in the last case selected will be the nodename.

if contentSource != nil && contentSource.GetSnapshot() != nil {
		snapshotID := contentSource.GetSnapshot().GetSnapshotId()

		selected, err = CreateSnapClone(ctx, req, snapshotID)
	} else if contentSource != nil && contentSource.GetVolume() != nil {
		srcVol := contentSource.GetVolume().GetVolumeId()
		selected, err = CreateVolClone(ctx, req, srcVol)
	} else {
		selected, err = CreateZFSVolume(ctx, req)
	}

This means that we are able to create fresh volumes correctly, but not from volume clones, or snapshot clones because a few lines down we have.

nodeid, err := zfs.GetNodeID(selected)

That method will fail when the value of selected is the OwnerNodeId. The code should be updated so that selected is populated from the correct field

jnels124 avatar May 31 '24 21:05 jnels124

@Ornias1993 are you setting the openebs.io/nodeid label to something other than the node name?

Interesting, i'm not explicitly doing that, but maybe VolSync is doing that? Is there an issue in that label NOT being set?

PrivatePuffin avatar Jun 01 '24 17:06 PrivatePuffin

@Abhinandan-Purkait

In my case, I believe the issue is caused by the opens.io/nodeid being different than the nodename.

The issue appears to be with driver/controller.go and the CreateVolume method

This block is problematic. Because the value of selected in the first two blocks comes from the resource spec OwnerNodeId and in the last case selected will be the nodename.

if contentSource != nil && contentSource.GetSnapshot() != nil {
		snapshotID := contentSource.GetSnapshot().GetSnapshotId()

		selected, err = CreateSnapClone(ctx, req, snapshotID)
	} else if contentSource != nil && contentSource.GetVolume() != nil {
		srcVol := contentSource.GetVolume().GetVolumeId()
		selected, err = CreateVolClone(ctx, req, srcVol)
	} else {
		selected, err = CreateZFSVolume(ctx, req)
	}

This means that we are able to create fresh volumes correctly, but not from volume clones, or snapshot clones because a few lines down we have.

nodeid, err := zfs.GetNodeID(selected)

That method will fail when the value of selected is the OwnerNodeId. The code should be updated so that selected is populated from the correct field

I believe the reason for the using the OwnerNodeId is to ensure that we select the node where the parent volume's pool is present, since snapshot, restore, clone all of them need to go to the same node.

You said that the nodename is different from openebs.io/nodeid's value. From the code seems like we fetch the node before setting the topology and if that succeeds we set this to openebs.io/nodeid. Can you point out when can we end up in your situation?

node, err := k8sapi.GetNode(ns.driver.config.Nodename)
if err != nil {
  klog.Errorf("failed to get the node %s", ns.driver.config.Nodename)
  return nil, err
}

if _, ok := topology[zfs.ZFSTopologyKey]; !ok {
  topology[zfs.ZFSTopologyKey] = ns.driver.config.Nodename
}

Abhinandan-Purkait avatar Jun 03 '24 06:06 Abhinandan-Purkait

@Abhinandan-Purkait We pre-label (before starting zfs components) our nodes with unique identifiers for openebs.io/nodeid and these identifiers differ from the node name. So in the snippet you provided, topology[zfs.ZFSTopologyKey] will already be set and assigned to ok and therefore will not be reassigned ns.driver.config.Nodename. See the block above the one you posted which initializes the topology map to the node labels when the allowed topologies is all as is the case in the default config and our config.

So with this configuration, I am able to create new ZFS volumes without issue. However, when you attempt to create a zfs volume from a snapshot definition it fails.

In the logs I can see failed to get the node {VALUE_OF_OPENEBS_NODEID} which highlights the issue mentioned in my previous comment. This GetNodeID method in volume.go only works when it is passed the nodename.

In the case of creating a new volume this method is called with the nodename but when it is from a volume clone or a snapshot it gets called with the value of openebs.io/nodeid. This is because this block here in driver/controller.go in the CreateVolume method:

if contentSource != nil && contentSource.GetSnapshot() != nil {
		snapshotID := contentSource.GetSnapshot().GetSnapshotId()

		selected, err = CreateSnapClone(ctx, req, snapshotID)
	} else if contentSource != nil && contentSource.GetVolume() != nil {
		srcVol := contentSource.GetVolume().GetVolumeId()
		selected, err = CreateVolClone(ctx, req, srcVol)
	} else {
		selected, err = CreateZFSVolume(ctx, req)
	}

CreateSnapClone --> returns spec.OwnerNodeId CreateVolClone --> returns spec.OwnerNodeId CreateZFSVolume --> returns nodename loop variable

Replication steps:

  1. Label node(s) with unique value for openebs.io/nodeid
  2. Deploy zfs components
  3. Create a pvc with zfs storage class
  4. Identify volume created successfully
  5. Create a snapshot and ensure the appropriate zfs and k8s resources exist
  6. Create a new pvc defined to be created from snapshot
  7. Notice zfs volume created and after 10 minutes goes to pending
  8. Logs will indicate unable to find node with nodeid value from the owning node

jnels124 avatar Jun 03 '24 16:06 jnels124

@Ornias1993 are you setting the openebs.io/nodeid label to something other than the node name?

Interesting, i'm not explicitly doing that, but maybe VolSync is doing that? Is there an issue in that label NOT being set?

I reviewed some of your logs that i didn't see before and saw that these two values match in your case. can ignore that.

jnels124 avatar Jun 03 '24 18:06 jnels124

@Abhinandan-Purkait We pre-label (before starting zfs components) our nodes with unique identifiers for openebs.io/nodeid and these identifiers differ from the node name. So in the snippet you provided, topology[zfs.ZFSTopologyKey] will already be set and assigned to ok and therefore will not be reassigned ns.driver.config.Nodename. See the block above the one you posted which initializes the topology map to the node labels when the allowed topologies is all as is the case in the default config and our config.

So with this configuration, I am able to create new ZFS volumes without issue. However, when you attempt to create a zfs volume from a snapshot definition it fails.

In the logs I can see failed to get the node {VALUE_OF_OPENEBS_NODEID} which highlights the issue mentioned in my previous comment. This GetNodeID method in volume.go only works when it is passed the nodename.

In the case of creating a new volume this method is called with the nodename but when it is from a volume clone or a snapshot it gets called with the value of openebs.io/nodeid. This is because this block here in driver/controller.go in the CreateVolume method:

if contentSource != nil && contentSource.GetSnapshot() != nil {
		snapshotID := contentSource.GetSnapshot().GetSnapshotId()

		selected, err = CreateSnapClone(ctx, req, snapshotID)
	} else if contentSource != nil && contentSource.GetVolume() != nil {
		srcVol := contentSource.GetVolume().GetVolumeId()
		selected, err = CreateVolClone(ctx, req, srcVol)
	} else {
		selected, err = CreateZFSVolume(ctx, req)
	}

CreateSnapClone --> returns spec.OwnerNodeId CreateVolClone --> returns spec.OwnerNodeId CreateZFSVolume --> returns nodename loop variable

Replication steps:

  1. Label node(s) with unique value for openebs.io/nodeid
  2. Deploy zfs components
  3. Create a pvc with zfs storage class
  4. Identify volume created successfully
  5. Create a snapshot and ensure the appropriate zfs and k8s resources exist
  6. Create a new pvc defined to be created from snapshot
  7. Notice zfs volume created and after 10 minutes goes to pending
  8. Logs will indicate unable to find node with nodeid value from the owning node

@jnels124 Thanks for explaining your use case. It moreover seems pre-labeling the node was not a part of the design and thus I feel as per current design it's not supposed to work or do you think the way it works is fundamentally wrong and needs refactoring. Can you please create an enhancement or refactoring issue and explain what can be done here so that it can be worked upon?

Having said that, there are two issues here. One is @jnels124 use case and the other reported by @Ornias1993, both of them seem to have different scope and might need different resolution.

Unfortunately we haven't been able to reproduce the latter, even with same volume specs and environment, as reported by @w3aman.

Abhinandan-Purkait avatar Jun 06 '24 05:06 Abhinandan-Purkait

Unfortunately we haven't been able to reproduce the latter, even with same volume specs and environment, as reported by @w3aman.

Yeah thats a shame for sure... Haven't have the time to look into it further either, as we're currently putting a lot of work on moving away from OpenEBS ZFS-LocalPV due to this issue

Worth of note: You can 100% certainly reproduce it when using VolSync and all the resources created by VolSync are 100% correct. It simply keeps waiting on OpenEBS ZFS LocalPV to clone, which never happens.


btw @Abhinandan-Purkait I still owe you an apology for my previous remark calling you an intern an incompetent. In hindsight, you shouldn't have been blamed by me solely and personally.

PrivatePuffin avatar Jun 06 '24 06:06 PrivatePuffin

Haven't have the time to look into it further either, as we're currently putting a lot of work on moving away from OpenEBS ZFS-LocalPV due to this issue

That's unfortunate. Alright we can try this out with the VolSync and let you know the findings.

Abhinandan-Purkait avatar Jun 06 '24 07:06 Abhinandan-Purkait

Haven't have the time to look into it further either, as we're currently putting a lot of work on moving away from OpenEBS ZFS-LocalPV due to this issue

That's unfortunate. Alright we can try this out with the VolSync and let you know the findings.

That would be completely awesome, thank you! My best educated guess would be a race condition between... well... something...

I see the request being filed for the clone, but it just... well... doesn't pass through the complete codepath... When you've some spare time, it might be worth while having some extra verbosity (flag), so we can more easily trace where in the path issues like this occur next time.

PrivatePuffin avatar Jun 06 '24 08:06 PrivatePuffin

Haven't have the time to look into it further either, as we're currently putting a lot of work on moving away from OpenEBS ZFS-LocalPV due to this issue

That's unfortunate. Alright we can try this out with the VolSync and let you know the findings.

That would be completely awesome, thank you! My best educated guess would be a race condition between... well... something...

I see the request being filed for the clone, but it just... well... doesn't pass through the complete codepath... When you've some spare time, it might be worth while having some extra verbosity (flag), so we can more easily trace where in the path issues like this occur next time.

Can you provide some information about the VolSync configuration? Like what is the replication method is being used and what is the copyMethod that is being used for that.

Abhinandan-Purkait avatar Jun 11 '24 14:06 Abhinandan-Purkait

  • Restic, Snapshot, using the ReplicationDestination Volume Populator

PrivatePuffin avatar Jun 11 '24 14:06 PrivatePuffin

  • Restic, Snapshot, using the ReplicationDestination Volume Populator

Hi @Ornias1993, I tried using the volsync with Restic + GCP bucket and Restore using VolumePopulator and copyMethod Snapshot. I was not able to reproduce it. You can check below:

The configs used: https://gist.github.com/Abhinandan-Purkait/599df18f75c90dbb4042b352ec5a2509

The logs and the pv: https://gist.github.com/Abhinandan-Purkait/2825853a4499f21ff883ae5c031dcaeb

Abhinandan-Purkait avatar Jun 14 '24 09:06 Abhinandan-Purkait

copyMethod

Logically because the copy method doesn't use snapshots at all-all. Thats why I said snapshot, not copy ;-)

However, thats great information, because that limits where the issue is accuring to purely the clone from snapshot side of things.

PrivatePuffin avatar Jun 14 '24 10:06 PrivatePuffin