opensearch-k8s-operator [Proposal] Add snapshot capability to the OpenSearch cluster.

Background:

As part of the next phase roadmap snapshot capability will be added to the operator, to create the OpenSearch cluster snapshot via the base configured yaml.

Proposal:

snapshot: 
  type: s3
  snapshot_repo: <CUSTOM_NAME>
##User required settings to connect to s3
  settings:
    bucket: my-bucket
    another_setting: setting-value
    region: us-east-1
    base_path: os-snapshot

Design

With the install custom plugins capability , its possible to now install repository-s3 (pluginsList: ["repository-s3"]), using this plugin the initial start is to add snapshot capability to the operator to store the snapshots to the s3 bucket. Once the snasphot of type s3 is added to the yaml and with user configured settings, an API call will be invoked to the cluster. Example PUT "https://localhost:9200/_snapshot/my_s3_repository_1?pretty" -H 'Content-Type: application/json' -d' { "type": "s3", "settings": { "bucket": "opensearch-s3-snapshot", "region": "us-east-1", "base_path": "os-snapshot" } } '

Assumptions:

The repository-s3 plugin is pre-installed by the user.
The s3 bucket created and recommended s3 permissions are handled by the user. More details: https://opensearch.org/docs/1.2/opensearch/snapshot-restore/

Sample AWS IAM policy to be added to the node role

{
  "Statement": [
    {
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>"
      ]
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}

Snapshot Management and Shared responsibilities

The base snapshot setup will be created and done by the operator.
Users can work on custom CronJob to invoke the snapshots periodically, example as PUT _snapshot/my_s3_repository_1/%3Csnapshot-%7Bnow%2Fd%7D%3E"

apiVersion: batch/v1
kind: CronJob
metadata:
  name: opensearch-snapshot-cron
spec:
  schedule: "@daily"
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: opensearch-snapshot-cron
            image: centos:7
            command:
            - /bin/bash
            args:
            - -c
   ## The following can be better handled with jq to fetch the {"accepted":true}
            - 'curl -s -i -k -u admin:admin -XPUT "https://<SERVICE_NAME>:<PORT>/_snapshot/_snapshot/my_s3_repository_1/%3Csnapshot-%7Bnow%2Fd%7D%3E"'
          restartPolicy: OnFailure

Users can use the snapshot management API to manage policies for the time and frequency of automatic snapshots

Future Enhancement

Based on the usage and request shared-file-system capability will be added.

Sep 03 '22 22:09 prudhvigodithi

If I may suggest it might be worth making the proposed snapshot config accept a list. This way it would be possible to configure more than one snapshot policy, i.e., for snapshotting to different repositories (e.g., S3 + GCS) or snapshotting specific index patterns.

Also ideally the 4 repository plugins listed in the official docs should be supported (which should be easy enough given that only the repository settings would be different)

repository-azure
repository-gcs
repository-hdfs
repository-s3

Sep 05 '22 00:09 max-frank

My thoughts on this:

It should be possible to configure multiple repositories
At least s3, azure and gcs repositories should be supported so we cover the big 3 cloud providers
Not sure if this needs to be in the first iteration but we should offer users a way to automatically get regular snapshots (basically the operator should create the cronjob for them)
The operator should report an error to the user if the needed repository plugin is not in the plugin list

Sep 10 '22 09:09 swoehrl-mw

Hey @max-frank thanks, as @swoehrl-mw mentioned we can start with s3, azure and gcs repositories, but what I propose is the method to use of snapshot: type: <cloud_provider> followed by the user configured settings (used s3 as an example above). Aslo @swoehrl-mw yes initial roll out, keeping simple I'm planning to just add the snapshot capability, also I would keep cronjob decision to the user, as there is always an _sm policy a user can configure from dashboard, the operator need not manage this overhead.https://opensearch.org/docs/latest/opensearch/snapshots/sm-api/. Also from cronjob instead of triggering an -XPUT API, its far better to do it via dashboard using snapshot-management policy, if not user can always extend to create a cron job to manage the snapshots. WDYT? @segalziv @idanl21 @dbason

Sep 12 '22 02:09 prudhvigodithi

@prudhvigodithi : I suggest to change the config yaml structure a bit:

snapshot: 
  repository:
    name: <CUSTOM_NAME>
    type: s3
    settings:
      ##User required settings to connect to s3
      bucket: my-bucket
      another_setting: setting-value
      region: us-east-1
      base_path: os-snapshot

That way if we later decide to add options to configure snapshot schedules and the like we can add it as a key like snapshot.schedules.

Sep 13 '22 14:09 swoehrl-mw

Something to also consider in terms of backups is the new remote backend storage feature released with 2.3.

While the feature is still experimental for now it would probably a good idea to consider it during design of the configuration here since it also is based on the repositories. Just to make sure that whatever format is chosen here can also support the eventual addition of remote backend storage.

Sep 16 '22 01:09 max-frank

One additional thing this needs for s3-compatible blob stores is setting the endpoint, etc. This has to be done in the opensearch.yml afaict.

https://opensearch.org/docs/latest/opensearch/snapshots/snapshot-restore#register-repository

Oct 19 '22 09:10 ibotty

Added on last release (v2.3.0) as BETA feature. Closing the issue, Please open new one with implementations that left for GA

May 10 '23 13:05 idanl21

opensearch-k8s-operator opensearch-k8s-operator copied to clipboard

[Proposal] Add snapshot capability to the OpenSearch cluster.

Background:

Proposal:

Design

Assumptions:

Snapshot Management and Shared responsibilities

Future Enhancement

opensearch-k8s-operator
opensearch-k8s-operator copied to clipboard