Unable to reinstate DiskPool when storage is moved to a new node
A small change to a node pool in Azure (changing the max pods parameter) causes Azure to update the underlying VMSS instance, which recreates the nodes with new hostnames. This causes a catastrophic storage failure because even though the disks (which were not deleted) were remounted to the new nodes automatically on startup, the DiskPools are inoperable. Below are some of the errors that were observed in operator_diskpool:
operator-diskpool 2024-09-06T03:08:14.251234Z ERROR operator_diskpool::context: Pool not found, clearing status, name: "pool-aks-storage-16504667-vmss000002"
operator-diskpool at k8s/operators/src/pool/context.rs:230
operator-diskpool in operator_diskpool::context::pool_check with name: "pool-aks-storage-16504667-vmss000002", status: Some(DiskPoolStatus { cr_state: Created, pool_status: None, capacity: 0, used: 0, available: 0 })
operator-diskpool in operator_diskpool::reconcile with name: aks-storage-27073831-vmss000002, status: Some(DiskPoolStatus { cr_state: Created, pool_status: None, capacity: 0, used: 0, available: 0 })
operator-diskpool in kube_runtime::controller::reconciling object with object.ref: DiskPool.v1beta2.openebs.io/pool-aks-storage-16504667-vmss000002.core-openebs, object.reason: reconciler requested retry
operator-diskpool
operator-diskpool 2024-09-06T03:08:14.264426Z ERROR operator_diskpool::context: Pool not found, clearing status, name: "pool-aks-storage-16504667-vmss000001"
operator-diskpool at k8s/operators/src/pool/context.rs:230
operator-diskpool in operator_diskpool::context::pool_check with name: "pool-aks-storage-16504667-vmss000001", status: Some(DiskPoolStatus { cr_state: Created, pool_status: None, capacity: 0, used: 0, available: 0 })
operator-diskpool in operator_diskpool::reconcile with name: aks-storage-27073831-vmss000001, status: Some(DiskPoolStatus { cr_state: Created, pool_status: None, capacity: 0, used: 0, available: 0 })
operator-diskpool in kube_runtime::controller::reconciling object with object.ref: DiskPool.v1beta2.openebs.io/pool-aks-storage-16504667-vmss000001.core-openebs, object.reason: reconciler requested retry
operator-diskpool
agent-core-grpc-probe openebs-agent-core (10.0.31.254:50051) open
etcd-probe openebs-etcd (10.0.161.206:2379) open
Stream closed EOF for core-openebs/openebs-operator-diskpool-7c6fff4449-jrbwq (agent-core-grpc-probe)
Stream closed EOF for core-openebs/openebs-operator-diskpool-7c6fff4449-jrbwq (etcd-probe)
I tried deleting the first DiskPool resource and re-adding it in the hopes that it would help, but now I get the following message for that DiskPool:
operator-diskpool 2024-09-06T03:10:05.048011Z ERROR operator_diskpool::context: The block device(s): aio:////dev/mayastor?blk_size=4096 can not be found
operator-diskpool at k8s/operators/src/pool/context.rs:301
operator-diskpool in operator_diskpool::context::create_or_import with name: "pool-aks-storage-16504667-vmss000000", status: Some(DiskPoolStatus { cr_state: Creating, pool_status: None, capacity: 0, used: 0, available: 0 })
operator-diskpool in operator_diskpool::reconcile with name: aks-storage-27073831-vmss000000, status: Some(DiskPoolStatus { cr_state: Creating, pool_status: None, capacity: 0, used: 0, available: 0 })
operator-diskpool in kube_runtime::controller::reconciling object with object.ref: DiskPool.v1beta2.openebs.io/pool-aks-storage-16504667-vmss000000.core-openebs, object.reason: error policy requested retry
operator-diskpool
operator-diskpool 2024-09-06T03:10:05.048056Z WARN operator_diskpool: HTTP response error: error in response: status code '409 Conflict', content: 'RestJsonError { details: "", message: "SvcError :: Deleting: Pool Resource pending deletion - please retry", kind: Deleting }', retry scheduled @Fri, 6 Sep 2024 03:10:25 +0000 (20 seconds from now)
operator-diskpool at k8s/operators/src/pool/main.rs:55
operator-diskpool in kube_runtime::controller::reconciling object with object.ref: DiskPool.v1beta2.openebs.io/pool-aks-storage-16504667-vmss000000.core-openebs, object.reason: error policy requested retry
operator-diskpool
It seems this is an irrecoverable situation for the cluster, where we'd basically have to delete and recreate everything, and restoring all of the PV contents - obviously not a good situation.
Submitting this ticket at @tiagolobocastro's request to track the issue and hopefully come up with a way to migrate DiskPool resources to new nodes when the persistent storage was retained and attached to the new node.
@tiagolobocastro you mentioned in another ticket that there is a possible way to recover a cluster in this scenario that "wasn't user friendly" - would you mind outlining those steps here for the scenario where the storage nodes have been replaced by new nodes with new names, etcd remains fully operational (i.e. has the configuration from the original cluster), and the new nodes have the same physical disks connected to them that were connected to the old nodes (thus have the volume data on them). I would guess that it would involve some combination of replacing node / diskpool names in the etcd db and manually updating pv/pvc annotations to point to new nodes / diskpools?
Given the severity of the issue when this occurs, I think it would be worth outlining the manual steps as a last resort - and hopefully we can work a method into openebs to migrate old diskpools to new nodes (or migrate diskpools to new diskpools on new nodes) when storage remains intact.
I would guess that it would involve some combination of replacing node / diskpool names in the etcd db and manually updating pv/pvc annotations to point to new nodes / diskpools?
Yes unfortunately we'd need to modify data in etcd to use new node names... and the node name is in quite a few number of places:
AppNodes, example:
/openebs.io/mayastor/apis/v0/clusters/xxxxxx/namespaces/mayastor/AppNodeSpec/node-name:
{
"id": "node-name",
....
}
IoNodes:
/openebs.io/mayastor/apis/v0/clusters/xxxxx/namespaces/mayastor/NodeSpec/node-name:
{
"id": "node-name",
"node_nqn": "nqn.2019-05.io.openebs:node-name:node-name",
....
}
Pools:
/openebs.io/mayastor/apis/v0/clusters/xxxx/namespaces/mayastor/PoolSpec/node-name:
{
"node": "adm-cp0",
....
}
And nexuses:
/openebs.io/mayastor/apis/v0/clusters/xxxx/namespaces/mayastor/NexusSpec/xxxx:
{
"node": "node-name",
....
}
And also the DiskPool CR's ofcourse...
Just wondering if we can handle this in another way, since these are supposed to be nodenames and not hostnames. So for example could you set the node-name to be the same as the previous one? @Abhinandan-Purkait any thoughts here?
@tiagolobocastro that is really helpful context. Why not have it automatically update those values when the node name changes in the DiskPool CR, ie if you have an existing DiskPool, and a change to the node name referenced by that DiskPool changes, it triggers those updates.
I don't think renaming the nodes simply for the purposes of accommodating this would work for many people as node names are often auto generated by kubernetes cloud infrastructure. If this could be effected automatically it would make DiskPools effectively portable (provided underlying storage is moved as well)
example could you set the node-name to be the same as the previous one?
Yeah, that's a good point actually, that should work in theory. It's just about replacing the node and bringing the node back with same name. But there is one thing that probably would be an issue is that if the IP of the node is different even though the name is made to be same.
@tiagolobocastro that is really helpful context. Why not have it automatically update those values when the node name changes in the DiskPool CR, ie if you have an existing DiskPool, and a change to the node name referenced by that DiskPool changes, it triggers those updates.
I don't think renaming the nodes simply for the purposes of accommodating this would work for many people as node names are often auto generated by kubernetes cloud infrastructure. If this could be effected automatically it would make DiskPools effectively portable (provided underlying storage is moved as well)
@dcaputo-harmoni Yes, ideally there should be a mechanism to just change pool ownership. We do have plans to work on it some time, iirc.
@tiagolobocastro @Abhinandan-Purkait Thanks for the feedback and comments. I'm surprised honestly that basic stuff such as this is not already built into such a widely used project as openebs. The inability to change pool ownership results in a critical / unrecoverable situation, and changing node names (which necessitate pool ownership changing) are very common in cloud kubernetes services such as Azure.
@tiagolobocastro @Abhinandan-Purkait I'm trying to develop a manual solution to this issue which I'll make available to the community as soon as it is complete so that others in the same situation can recover from it. I'm planning to tackle it on two levels: 1. Rename DiskPools, and 2. Rename nodes that DiskPools reference.
For the first one (renaming DiskPools), I've wrote a bash script that can be run in etcd that replaces all occurrences of DiskPool names in etcd keys and values with new names. That worked, and it replaced 3 PoolSpec keys / values, as well as 138 ReplicaSpec values that referenced the old pool names.
Then I've got a second script that exprots the DiskPool CRs, modifies the name and redeploys the CRs with the new name (it also removes the finalizer from the old DiskPools to allow deletion before recreation). I then restarted operator-diskpool but instead of picking up the new CRs, it created CRs with the old names. Any idea why it would do this / how to make it pick up the new ones?
The DiskPool CRs created with the new names were just sitting there in a "Creating" state but never progressed, however the ones that operator-diskpool created with the old names seemed to come up fine. I verified that etcd has in fact been updated and there are no longer any references to the old DiskPool names. The cluster is working fine in the meanwhile, but I'm sure it's in a fragile state given these changes. Below are some relevant logs:
operator-diskpool:
operator-diskpool 2024-09-13T20:47:14.123286Z WARN operator_diskpool: HTTP response error: error in response: status code '400 Bad Request', content: 'RestJsonError { details: "create_pool::status: InvalidArgument, message: \": volume is busy, failed to import pool //dev/mayastor\", details: ], metadata: MetadataMap { headers: {\"content-type\": \"application/grpc\", \"date\": \"Fri, 13 Sep 2024 20:47:14 GMT\", \"content-length\": \"0\"} }", message: "SvcError::GrpcRequestError", kind: InvalidArgument }', retry scheduled @Fri, 13 Sep 2024 20:47:34 +0000 (20 seconds from now)
operator-diskpool at k8s/operators/src/pool/main.rs:55
operator-diskpool in kube_runtime::controller::reconciling object with object.ref: DiskPool.v1beta2.openebs.io/pool-2.core-openebs, object.reason: error policy requested retry
operator-diskpool
operator-diskpool 2024-09-13T20:47:14.644280Z ERROR operator_diskpool::context: The block device(s): aio:////dev/mayastor?blk_size=4096 can not be found
operator-diskpool at k8s/operators/src/pool/context.rs:301
operator-diskpool in operator_diskpool::context::create_or_import with name: "pool-1", status: Some(DiskPoolStatus { cr_state: Creating, pool_status: None, capacity: 0, used: 0, available: 0 })
operator-diskpool in operator_diskpool::reconcile with name: aks-storage-14636457-vmss000001, status: Some(DiskPoolStatus { cr_state: Creating, pool_status: None, capacity: 0, used: 0, available: 0 })
operator-diskpool in kube_runtime::controller::reconciling object with object.ref: DiskPool.v1beta2.openebs.io/pool-1.core-openebs, object.reason: error policy requested retry
operator-diskpool
operator-diskpool 2024-09-13T20:47:14.644332Z WARN operator_diskpool: HTTP response error: error in response: status code '400 Bad Request', content: 'RestJsonError { details: "create_pool::status: InvalidArgument, message: \": volume is busy, failed to import pool //dev/mayastor\", details: ], metadata: MetadataMap { headers: {\"content-type\": \"application/grpc\", \"date\": \"Fri, 13 Sep 2024 20:47:14 GMT\", \"content-length\": \"0\"} }", message: "SvcError::GrpcRequestError", kind: InvalidArgument }', retry scheduled @Fri, 13 Sep 2024 20:47:34 +0000 (20 seconds from now)
operator-diskpool at k8s/operators/src/pool/main.rs:55
operator-diskpool in kube_runtime::controller::reconciling object with object.ref: DiskPool.v1beta2.openebs.io/pool-1.core-openebs, object.reason: error policy requested retry
operator-diskpool
agent-core-grpc-probe openebs-agent-core (10.0.242.206:50051) open
io-engine: (from node 1)
io-engine [2024-09-13T20:48:54.920674168+00:00 INFO io_engine::grpc::v1::pool:pool.rs:508] CreatePoolRequest { name: "pool-1", uuid: None, disks: ["aio:////dev/mayastor?blk_size=4096"], pooltype: Lvs, cluster_size: None }
io-engine [2024-09-13T20:48:54.920752053+00:00 INFO io_engine::lvs::lvs_store:lvs_store.rs:508] Creating or importing lvs 'pool-1' from 'aio:////dev/mayastor?blk_size=4096'...
io-engine [2024-09-13T20:48:54.920784252+00:00 ERROR io_engine::lvs::lvs_store:lvs_store.rs:335] error=: volume is busy, failed to import pool //dev/mayastor
io-engine [2024-09-13T20:48:54.920796039+00:00 ERROR io_engine::lvs::lvs_store:lvs_store.rs:504] error=: volume is busy, failed to import pool //dev/mayastor
io-engine [2024-09-13T20:49:14.975887062+00:00 INFO io_engine::grpc::v1::pool:pool.rs:508] CreatePoolRequest { name: "pool-1", uuid: None, disks: ["aio:////dev/mayastor?blk_size=4096"], pooltype: Lvs, cluster_size: None }
io-engine [2024-09-13T20:49:14.975965970+00:00 INFO io_engine::lvs::lvs_store:lvs_store.rs:508] Creating or importing lvs 'pool-1' from 'aio:////dev/mayastor?blk_size=4096'...
io-engine [2024-09-13T20:49:14.975992582+00:00 ERROR io_engine::lvs::lvs_store:lvs_store.rs:335] error=: volume is busy, failed to import pool //dev/mayastor
io-engine [2024-09-13T20:49:14.976007324+00:00 ERROR io_engine::lvs::lvs_store:lvs_store.rs:504] error=: volume is busy, failed to import pool //dev/mayastor
agent-core:
agent-core 2024-09-13T20:50:34.192079Z ERROR core::pool::service: error: gRPC request 'create_pool' for 'Pool' failed with 'status: InvalidArgument, message: ": volume is busy, failed to import pool //dev/mayastor", details: ], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Fri, 13 Sep 2024 20:50:34 GMT", "content-length": "0"} }'
agent-core at control-plane/agents/src/bin/core/pool/service.rs:285
agent-core in core::pool::service::create_pool with request: CreatePool { node: NodeId("aks-storage-14636457-vmss000000"), id: PoolId("pool-0"), disks: [PoolDeviceUri("aio:////dev/mayastor?blk_size=4096")], labels: Some({"openebs.io/created-by": "operator-diskpool"}) }, pool.id: pool-0
agent-core
agent-core 2024-09-13T20:50:34.623039Z ERROR core::controller::io_engine::v1::pool: error: gRPC request 'create_pool' for 'Pool' failed with 'status: InvalidArgument, message: ": volume is busy, failed to import pool //dev/mayastor", details: ], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Fri, 13 Sep 2024 20:50:34 GMT", "content-length": "0"} }'
agent-core at control-plane/agents/src/bin/core/controller/io_engine/v1/pool.rs:33
agent-core in core::pool::service::create_pool with request: CreatePool { node: NodeId("aks-storage-14636457-vmss000002"), id: PoolId("pool-2"), disks: [PoolDeviceUri("aio:////dev/mayastor?blk_size=4096")], labels: Some({"openebs.io/created-by": "operator-diskpool"}) }, pool.id: pool-2
agent-core
agent-core 2024-09-13T20:50:34.630277Z ERROR core::pool::service: error: gRPC request 'create_pool' for 'Pool' failed with 'status: InvalidArgument, message: ": volume is busy, failed to import pool //dev/mayastor", details: ], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Fri, 13 Sep 2024 20:50:34 GMT", "content-length": "0"} }'
agent-core at control-plane/agents/src/bin/core/pool/service.rs:285
agent-core in core::pool::service::create_pool with request: CreatePool { node: NodeId("aks-storage-14636457-vmss000002"), id: PoolId("pool-2"), disks: [PoolDeviceUri("aio:////dev/mayastor?blk_size=4096")], labels: Some({"openebs.io/created-by": "operator-diskpool"}) }, pool.id: pool-2
agent-core
agent-core 2024-09-13T20:50:35.244397Z ERROR core::controller::io_engine::v1::pool: error: gRPC request 'create_pool' for 'Pool' failed with 'status: InvalidArgument, message: ": volume is busy, failed to import pool //dev/mayastor", details: ], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Fri, 13 Sep 2024 20:50:35 GMT", "content-length": "0"} }'
agent-core at control-plane/agents/src/bin/core/controller/io_engine/v1/pool.rs:33
agent-core in core::pool::service::create_pool with request: CreatePool { node: NodeId("aks-storage-14636457-vmss000001"), id: PoolId("pool-1"), disks: [PoolDeviceUri("aio:////dev/mayastor?blk_size=4096")], labels: Some({"openebs.io/created-by": "operator-diskpool"}) }, pool.id: pool-1
agent-core
agent-core 2024-09-13T20:50:35.262952Z ERROR core::pool::service: error: gRPC request 'create_pool' for 'Pool' failed with 'status: InvalidArgument, message: ": volume is busy, failed to import pool //dev/mayastor", details: ], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Fri, 13 Sep 2024 20:50:35 GMT", "content-length": "0"} }'
agent-core at control-plane/agents/src/bin/core/pool/service.rs:285
agent-core in core::pool::service::create_pool with request: CreatePool { node: NodeId("aks-storage-14636457-vmss000001"), id: PoolId("pool-1"), disks: [PoolDeviceUri("aio:////dev/mayastor?blk_size=4096")], labels: Some({"openebs.io/created-by": "operator-diskpool"}) }, pool.id: pool-1
agent-core
Issues go stale after 90d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.
I've been looking at longhorn, but it also has issues on AKS when it comes to upgrading clusters etc. (https://longhorn.io/docs/1.7.2/advanced-resources/support-managed-k8s-service/upgrade-k8s-on-aks/), and to be honest this worries me quite a bit for a system that's suppose to be reliable and fault tolerant. I was hoping that OpenEBS might be a solution but it looks like it also suffers from some of the same issues specific to how Azure AKS works 😞
Issues go stale after 90d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.
Issues go stale after 90d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.
I'm unsure if this is very related to the current issue but I need to write somewhere about this:
in a moment of brilliance I decided to rename all my nodes, at the same time while I migrated from k3s->rke2 - effectively recreating the cluster. The rename was from fqdn to hostname. As you can imagine openebs didn't like that (note here that I had no backups since I didn't expect openebs to behave that way). all my volumes appeared faulted
So same nodes, exact same paths and all, but completely new nodes as far as openebs and rke2 is concerned.
Also I JUST set up OpenEBS (migrated from longhorn) and have 0 experience with it so I don't even know what I'm doing
This is how I recovered:
First of all I changed all the diskpools' node - which while it ran fine, I think it wasn't taken into effect.
Then, etcd editing:
k exec -it openebs-etcd-1 -- bash:
etcdctl snapshot save /bitnami/etcd/snapshot-$(date +%F)
clusterid="<the id>"
# manually - remember the single quotes in the put command!
etcdctl get /openebs.io/mayastor/apis/v0/clusters/$clusterid/namespaces/openebs/PoolSpec/<pool name>
etcdctl put /openebs.io/mayastor/apis/v0/clusters/$clusterid/namespaces/openebs/PoolSpec/<pool name> -- 'json from get but with the new "node" value'
etcdctl snapshot save /bitnami/etcd/snapshot-$(date +%F)-patchedpools
nexusspecs=$(etcdctl get --prefix "/openebs.io/mayastor/apis/v0/clusters/$clusterid/namespaces/openebs/NexusSpec" --keys-only)
for n in $nexusspecs; do
echo $n;
data=$(etcdctl get --print-value-only $n | sed -E 's/10.11.12/10.20.30/g' | sed -E 's/("node":"|node-name:)([a-zA-Z]{2,3})([0-9]).<mydomain>"/\1\2\3"/g');
echo -e "$data\n\n";
# etcdctl put $n -- "$data"; # commented out just in case someone is feeling spicy
done
etcdctl snapshot save /bitnami/etcd/snapshot-$(date +%F)-migrated
The long sed does a couple of stuff:
- Changes the IP of the nodes (yes I changed node IPs as well 🧠)
- Changes the
nodevalue of the nexusspec - Changes the
allowed_hostsvalue of the nexusspec
I rollout restarted almost everything - nats, ha, core and the volumes started popping up! degraded and as time passes back to online!
Some questions, if anyone's still reading:
- please tell me if I need to do more stuff to finish the migration
- how do I get rid of the old nodes?
- how do I rename the pools? is it even worth it?
Key takeaway: NEVER rename your nodes all at once :)
Issues go stale after 90d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.