velero icon indicating copy to clipboard operation
velero copied to clipboard

Add Generic Find/Replace Plugin for Restores

Open jordanwilson230 opened this issue 6 years ago • 34 comments

First of all, this is an incredible tool.

Issue: When restoring a backup onto a new cluster using e.g.,--namespace-mappings staging:staging-test or --namespace-mappings staging:staging, any ingress or LBs that are deployed will overwrite dns entries that are currently in place for the cloned cluster. This isn't necessarily undesirable (i.e., for disaster recovery), but it prevents ark from being used easily for cloning into new test environments. Is it possible to search and replace these resource specific fields and replace them with a value specified with e.g., --hosted-zone=test.example.com or --hosted-zone=staging-test.example.com?

I tried downloading the backup locally, unpacking it, and running sed replacements, but there were issues after restoring from the raw json via kubectl. I can't remember what the issues were, but I suspect it had to do with the ordering in which resources were created (I was pretty lazy in issuing a kubectl apply -f on all directories).

This may be something that cannot be easily resolved, but I thought it still worth asking. If not possible, the best method might be to exclude those resources and manually create them after a restore...I will test that as well.

jordanwilson230 avatar May 03 '18 15:05 jordanwilson230

@jordanwilson230 could you please give us a bit more information about what specific fields need to have their values changed? An example with yaml/json would be quite helpful. Thank you!

ncdc avatar May 03 '18 20:05 ncdc

@ncdc No problem, and thanks for looking at it. Here is an example for an ingress we're using:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: management-ingress
  annotations:
    kubernetes.io/ingress.allow-http: "false"
spec:
  tls:
    - secretName: kubernetes.bitbrew.com
  rules:
  **- host: api-service-staging.kubernetes.bitbrew.com**
    http:
      paths:
      - path: /v1/*
        backend:
          serviceName: management-svc
          servicePort: 80
  **- host: api-management-staging.hub.bitbrew.com**
    http:
      paths:
      - path: /v1/*
        backend:
          serviceName: management-svc
          servicePort: 80

Here is an example of a LB we're using:

---
apiVersion: v1
kind: Service
metadata:
  name: rabbitmq-lb
  annotations:
    **external-dns.alpha.kubernetes.io/hostname: staging-rabbitmq.kubernetes.bitbrew.com.**
spec:
  type: LoadBalancer
  ports:
  - name: client-port
    protocol: TCP
    port: 4443
    targetPort: 5672
  - name: srmqp-mgmt-port
    protocol: TCP
    port: 15671
    targetPort: 15671
  - name: sclient-port
    protocol: TCP
    port: 5671
    targetPort: 5671
  selector:
    app: rabbitmq

Note that for the last example, we're using an annotation for external-dns to pick up which of course is different than the ingress. It'd be great to have, but I don't expect support for anything outside of the core K8s resources :)

Ideally, something such as --hosted-zone=test.kubernetes.bitbrew.com would replace (in the above example) all dns hostname fields i.e., staging-api.kubernetes.bitbrew.com with staging-api.test.kubernetes.bitbrew.com. This way, the cluster we restore into will not overwrite the DNS records for the cluster we've cloned from (which is still running). Of course, this assumes that the hosted zone, test.kubernetes.bitbrew.com has been created ahead of time by us, which is fine.

jordanwilson230 avatar May 04 '18 14:05 jordanwilson230

@jordanwilson230 ok this is much clearer now, thanks. Would something like a generic find/replace plugin work for you, where you could configure it somewhat like this?

changes:
- resource: ingresses
  field: spec.rules[].host
  find: *.kubernetes.bitbrew.com
  replace: xyz...

ncdc avatar May 07 '18 21:05 ncdc

@ncdc Yes, that would work, but how that example you gave applied? I've taken a stab at building from source (working with collections in the service-restore.go file to do a replace), and I've looked at the plugin examples, but I'm not sure where that falls in. Also, I've never looked at Golang until now, so I'm not so proficient. Is the snippet you provided part of a k8 merge definition or something? Thanks again.

jordanwilson230 avatar May 07 '18 22:05 jordanwilson230

Right now I'm using a pretty nasty looking sed command as part of the script I'm using. ...After downloading a backup:

    tar xzf ${BACKUP}-data.tar.gz
    rm ${BACKUP}-data.tar.gz
# Search for DNS/hostnames in the backup and prefix with user specified value (e.g., test to create test.kubernetes.bitbrew.com)
    read -p "Enter a name to prefix to kubernetes.bitbrew.com and hub.bitbrew.com (i.e., test): " hosted_zone
    files=($(grep -lrie '.bitbrew.com' ./resources ))
    for file in ${files[@]} ; do cat $file | jq '.' | sed -e 's|last-applied-configuration":\(.*\)\\n"\(.*\)|last-applied-configuration":"{\\"\\"}"\2|g; s|\(.*\)\.\(.*\).bitbrew.com\(.*\)|\1.'${hosted_zone}'.\2.bitbrew.com\3|g' | jq -r '. | @json' > ${file}.copy ; done
    for file in ${files[@]}; do mv ${file}.copy ${file} ; done
    tar czf ${BACKUP}.tar.gz resources
    gsutil cp ${BACKUP}.tar.gz gs://${BUCKET}/${BUCKET}/

It seems to work, but I certainly prefer efficiency and simplicity!

jordanwilson230 avatar May 07 '18 22:05 jordanwilson230

This is a hypothetical configuration for a restore item action plugin that doesn't currently exist 😄.

ncdc avatar May 08 '18 16:05 ncdc

I have the same use case. 2 clusters: 1 for Dev, 1 for Prod. 2 different domains for ingresses. Would like to clone env from Prod cluster to Dev cluster for debugging purpose for example. The ideal case would be ark modifying ingresses domain on the fly when restoring. Could be applicable to storageClass the day it support it. It's not our primary way to deploy resources on k8s of course. We usually do it through our CI/CD pipeline. I was wondering if there are any way to use ksonnet engine to do this in ark.

neith00 avatar Jun 05 '18 19:06 neith00

We'll work with @bryanl & team to figure out the best way to implement this.

ncdc avatar Jun 05 '18 19:06 ncdc

This is a great customer use-case for resources that are typically tied to a specific environment. External-dns as described here as one example, I would think another might be kube-lego/cert-manager are other workloads that are highly dependent on fields like these.

+1 for unlocking new use cases!

Suggest renaming title of issue to Add Generic Find/Replace plugin for restores

rosskukulinski avatar Jun 14 '18 04:06 rosskukulinski

A generic find/replace plugin would be great for us too. Thinking about certain applications deployed that query GCP resources, these often reference specific zones/regions.

In the event of a region outage, we may choose to deploy a cluster (and other GCP resources) into another region. Any application that has this region set via config will now be incorrect; a find/replace would allow us to use common arguments/environment variables for these settings, and replace them to the correct values on a restore

Evesy avatar Jul 25 '18 17:07 Evesy

Any update on this? We're at the point of deciding whether to implement an internal backup/restore process for spinning up test environments, but would greatly prefer using ark and the features that come with it! Thanks!

jordanwilson230 avatar Oct 25 '18 03:10 jordanwilson230

@jordanwilson230 we have not implemented this yet. We'd greatly appreciate your feedback as to how you think users should specify the transformations. This is something where the UI/UX really needs to be clear and easy.

ncdc avatar Oct 25 '18 15:10 ncdc

Perhaps this functionality would a good fit at the plugin level? If it would be possible to provide examples specific for transforming (perhaps an example for each resource type)?

This plugin, for example: https://github.com/heptio/ark-plugin-example/blob/de9801def1466e73a72a62dd5c1a71dc479117b2/ark-restoreitemaction-log-and-annotate/myrestoreplugin.go#L45 shows how to add an annotation via k8s.io/apimachinery/pkg/api/meta and k8s.io/apimachinery/pkg/runtime. For those of us that don't know GO or are inexperienced with k8s api, it would be awesome if you guys were able to add other examples:

  • Rather than adding an annotation, searching for and replacing an annotation
  • Same as above, but an example outside meta...i.e., altering something at the spec.container level. Perhaps changing the value of some environment variable that was set at deployment, such as PRODUCTION: "false" instead of PRODUCTION: "true".

Plugins might offer users more flexibility and in-house customization without adding complexity to ark core. In that case, perhaps all that is needed are some code examples for search/replacing in https://github.com/heptio/ark-plugin-example

@ncdc, @rosskukulinski, what are your thoughts on using plugins for this?

jordanwilson230 avatar Oct 25 '18 16:10 jordanwilson230

I do think this can and should be a plugin. I'd like it to be generic enough that it meets the needs of as close to 100% of users as possible.

Do you anticipate that the same set of transformations would apply to every restore in a cluster? My guess is "no" but I'd love to hear the community's thoughts on this.

If the answer is "no", then we need a way to tell a restore which set of transformations to use. I think the easiest way to do this is probably with label selectors. We could store the transformation configurations as configmaps, and one of the pieces of data in each configuration would be a label selector to match against restores. The plugin would load all the transformation configmaps into memory, and then do selector matching against the labels on the restore.

ncdc avatar Oct 25 '18 17:10 ncdc

@ncdc I think pinning the transformations to labels is a great idea. I too would like to hear from others' thoughts @Evesy @neith00

To answer your question, you're right; for it to be applicable to the larger community, I think users will eventually find a need to apply different sets of transformation (as you mentioned, for example, via config).

Im starting to read more on the basics of Golang in an effort to help, but if anyone at Heptio eventually has the time for this feature, that'd be awesome! Love your guys' work.

jordanwilson230 avatar Oct 25 '18 17:10 jordanwilson230

We'd love to hear your thoughts on the structure of the configuration. In a previous comment, I suggested something like this:

changes:
- resource: ingresses
  field: spec.rules[].host
  find: *.kubernetes.bitbrew.com
  replace: xyz...

We'd probably want to take advantage of Kubernetes api machinery so it would probably look more like:

apiVersion: ark.heptio.com/v1
kind: RestoreTransformation
metadata:
  namespace: heptio-ark
  name: my-transformation-1
selector:
  matchLabels:
    color: green
changes:
- resource: ingresses
  field: spec.rules[].host
  find: *.kubernetes.bitbrew.com
  replace: xyz...

I'm primarily interested in brainstorming on the changes section:

  • How do we specify the name of the field to transform
    • What if there's an array in the path?
    • What if there's a map in the path?
    • Where do we allow wildcard matching?
  • Should find be optional? (my vote is yes)
  • etc

ncdc avatar Oct 25 '18 17:10 ncdc

In my understanding you are proposing to store transformation in configmaps, but I don"t see how it could operate. You want to store configmaps with transformation in the same namespace as ark or in the destination namespace?

neith00 avatar Oct 31 '18 10:10 neith00

In the same namespace as ark

On Wed, Oct 31, 2018 at 6:43 AM neith00 [email protected] wrote:

In my understanding you are proposing to store transformation in configmaps, but I don"t see how it could operate. You want to store configmaps with transformation in the same namespace as ark or in the destination namespace?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/heptio/ark/issues/474#issuecomment-434640774, or mute the thread https://github.com/notifications/unsubscribe-auth/AAABYiTFZkwI53CJxojq2e52LFWwHBkdks5uqX7kgaJpZM4TxR2N .

ncdc avatar Oct 31 '18 11:10 ncdc

how would you treat the case where you only want to change the domain name in an ingress with your example without using find?

neith00 avatar Oct 31 '18 14:10 neith00

If you want to match a specific ingress & its domain name and make a change, you would need to use find. Perhaps what I wrote above was confusing when I said find should be optional?

ncdc avatar Oct 31 '18 14:10 ncdc

got it you meant the field find is optionnal. now I get it

neith00 avatar Oct 31 '18 14:10 neith00

@ncdc I think https://github.com/google/kasane could be useful for replacing complex values

neith00 avatar Nov 25 '18 11:11 neith00

If you squint, this is somewhat similar to the "general" problem we have of customizing manifests for deployment in the first place. Having the ability to compose those transformations via Plug-Ins would be very powerful indeed.

@neith00 's suggestion of Kasane was very similar to where my mind was going with this problem with something like https://github.com/kubernetes-sigs/kustomize

There are several projects that attempt to support patching/overlay of the resources and provide some of the "Hard Work" around this work.

jrnt30 avatar Nov 28 '18 00:11 jrnt30

cc @bryanl - would love to get your input & we should chat about this soon-ish

ncdc avatar Nov 28 '18 14:11 ncdc

Was asked by @nrb to add my use case's requirements to this issue. My organization is migrating to a new flavor of kubernetes, and unfortunately all resource definitions don't align with our new offering so we cannot simply do an Ark/Velero backup and restore. Example: Tectonic has a custom ingress, that won't line up with what we have in either a plain vanilla k8s offering or a more opinionated offering. It would be great to have either a 'genericizer' of these resource types that checks the new and old cluster if the old resource definition isn't in the new cluster. Or, if the new cluster has it's own 'opinionated' option for that new resource type, if it could do its best guess to line that up in new cluster (example: openshift CRDs).

justinhauer avatar Mar 12 '19 17:03 justinhauer

I have a similar use-case. We create 2 ingress for apps on our k8s cluster:

  • A generic ingress per app for an environment (which we use for customer facing applications).
  • A cluster-specific ingress per app so that we can reach 2 instances of the same application deployed on 2 different clusters for the same environment. (used by Cluster Admins for testing while upgrading, new feature testing, etc).

Example: If my application name is nginx-app, the generic ingress for stage environment would be nginx-app.stage.domain.net and the cluster-specific domains would be nginx-app.cluster1.stage.domain.net and nginx-app.cluster2.stage.domain.net where cluster1 and cluster 2 are the names of my k8s clusters.

While migrating workloads from cluster 1 to cluster 2, generic domain remains the same, and hence, no changes needed there. But, the cluster-specific ingress domain will still have the old hostname in the ingress, and the app will not be reachable on the new cluster using the cluster-specific hostname.

We'd love to hear your thoughts on the structure of the configuration. In a previous comment, I suggested something like this:

changes:
- resource: ingresses
  field: spec.rules[].host
  find: *.kubernetes.bitbrew.com
  replace: xyz...

We'd probably want to take advantage of Kubernetes api machinery so it would probably look more like:

apiVersion: ark.heptio.com/v1
kind: RestoreTransformation
metadata:
  namespace: heptio-ark
  name: my-transformation-1
selector:
  matchLabels:
    color: green
changes:
- resource: ingresses
  field: spec.rules[].host
  find: *.kubernetes.bitbrew.com
  replace: xyz...

I'm primarily interested in brainstorming on the changes section:

  • How do we specify the name of the field to transform

    • What if there's an array in the path?
    • What if there's a map in the path?
    • Where do we allow wildcard matching?
  • Should find be optional? (my vote is yes)

  • etc

What you have explained here would probably work for all use-cases described in this issue.

rajakshay avatar Jul 23 '19 23:07 rajakshay

Use-case: stripping specific annotations from backup/restore operations.

I had an incident recently where I created a backup of a cluster in DigitalOcean, then tested restoring that backup into a separate cluster. Unfortunately DigitalOcean stores the UUID of their load balancer as an annotation in the LoadBalancer service, which means that when the snapshot was restored instead of provisioning a new DO Load Balancer it fought the initial cluster for control of the existing one.

Being able to remove this annotation would make the backup/restore viable for DigitalOcean clusters.

https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/016600fb188a6c2b9082f45ea67514e8ef9509b9/docs/getting-started.md#load-balancer-id-annotations

zikes avatar Sep 25 '19 16:09 zikes

+1 for this feature

debianmaster avatar Oct 24 '19 16:10 debianmaster

+1 for the feature!. It would be very helpful in our usecase as well.

Sushma10037017 avatar Nov 25 '19 09:11 Sushma10037017

See #2090 for a related request using kustomize.

skriss avatar Dec 13 '19 15:12 skriss