trident
trident copied to clipboard
Trident to handle snapmirror release of volumes when deleting volumes
Describe the solution you'd like It would be nice if Trident could handle releasing of snapmirror relationships instead of just parsing the delete error message from the backend.
Describe alternatives you've considered We have considered writing an admission controller that would listen to pv/pvc delete events and handle the snapmirror relationship there. It would be nicer if it would be possible to do it within Trident.
Additional context
Trident error message:
time="2020-09-24T05:53:56Z" level=error msg="Unable to delete volume from backend." backendUUID=a4736185-3aef-49fe-a355-793a30525f3b error="error destroying volume xxxxx_pr_prod_yyyy_data_yyyy_cluster_1_kafka_0_e4c08: API status: failed, Reason: Volume \”xxxxx_pr_prod_yyyy_data_yyyy_cluster_1_kafka_0_e4c08\" in Vserver \”xxx-nns017\" is the source endpoint of one or more SnapMirror relationships. Before you delete the volume, you must release the source information of the SnapMirror relationships using \"snapmirror release\". To display the destinations to be used in the \"snapmirror release\" commands, use the \"snapmirror list-destinations -source-vserver xxx-nns017 -source-volume xxxxx_pr_prod_yyyy_data_yyyy_cluster_1_kafka_0_e4c08\" command., Code: 18436" volume=prod-yyyy-data-yyyy-cluster-1-kafka-0-e4c08
I'm interested in this as well - did you end up writing that controller in the end @loxley ?
A customer using AWS FSx ONTAP is seeing exactly the same issue. Trident fails to delete the PV due to the SnapMirror between regions. With FSx ONTAP, that mirroring is not user controllable so the mirror cannot be released. This requires the user to delete the PV manually then go into the FSx ONTAP console and delete the volume.
@loxley @JanusTekk @forgosh I tried this out with Trident v22.01 and an FSx backend (multi-AZ, 1 SVM). I could not see any SnapMirrors created on the volumes. Were the SnapMirrors created by FSx for you? Or was it a manual SnapMirror that was created out-of-band?
@loxley @JanusTekk @forgosh I ran into this too. Deleting the PVC succeeded, but the PV (and the backing FSx volume) can't be deleted by Trident. The error I get is just as described in this issue. I dug deeper and tried to list available snapmirrors using the fsxadmin
user role, but weirdly they are not listed.
FsxId05565399c42b74f2d::> snapmirror show -destination-path nfs_svm:trident_pvc_983996d5_982d_434c_a808_3b3df458b71b
There are no entries matching your query.
FsxId05565399c42b74f2d::> snapmirror show -source-path nfs_svm:trident_pvc_983996d5_982d_434c_a808_3b3df458b71b
There are no entries matching your query.
So it appears that all volumes created on FSx have a SnapMirror created on them that isn't visible through FSx. In addition, I didn't find a config knob when setting up FSx to manage this.
Need this for a customer as well. k8s team doesn't have access to ONTAP outside of the trident integration, so their storage team has implemented automation to establish the DP vol / snapmirror for a subset of PVCs. This is not a static list, new PVCs are automatically matched and processed based upon matching specific volume criteria.
This environment has been online for a couple years now, and as they're attempting to retire and delete PVCs, they're running into this challenge.
They can create their own post delete automation to cleanup the DP if the primary has been removed. Due to this issue the k8s team can't delete the primary via trident, requiring admin intervention and cross-team coordination to remove the mirror relationship, then remove the PVC via trident, before automation re-establishes the replication. Manual processes are understandably not desired for this.
The immediate scope involves cleanup and retirement of hundreds of PVCs.
Help to prioritize and resolve this issue is appreciated,
@klichwalla,
Unfortunately Trident can't make a determination to remove SnapMirror relationships that Trident didn't create. SnapMirror has a complex set of APIs to work with and being able to reliably break an SVM relationship that Trident didn't create is not something that can be supported. If not done correctly the SnapMirror relationship can be left in an error state that requires more manual intervention.
With the recent Astra Control Center release, replication is now supported. This is the current recommended approach to providing replication support when using Trident.