cain
cain copied to clipboard
Backup and restore tool for Cassandra on Kubernetes
Cain
Cain is a backup and restore tool for Cassandra on Kubernetes. It is named after the DC Comics superhero Cassandra Cain.
Cain supports the following cloud storage services:
- AWS S3
- Minio S3
- Azure Blob Storage
Cain is now an official part of the Helm incubator/cassandra chart!
Install
Prerequisites
- git
- dep
From a release
Download the latest release from the Releases page or use it with a Docker image
From source
mkdir -p $GOPATH/src/github.com/nuvo && cd $_
git clone https://github.com/nuvo/cain.git && cd cain
make
Commands
Backup Cassandra cluster to cloud storage
Cain performs a backup in the following way:
- Backup the
keyspace
schema (usingcqlsh
). - Get backup data using
nodetool snapshot
- it creates a snapshot of thekeyspace
in all Cassandra pods in the givennamespace
(according toselector
). - Copy the files in
parallel
to cloud storage using Skbn - it copies the files to the specifieddst
, undernamespace/<cassandrClusterName>/keyspace/<keyspaceSchemaHash>/tag/
. - Clear all snapshots.
Usage
$ cain backup --help
backup cassandra cluster to cloud storage
Usage:
cain backup [flags]
Flags:
-a, --authentication use authentication for nodetool and clqsh. Overrides $CAIN_AUTHENTICATION
-b, --buffer-size float in memory buffer size (MB) to use for files copy (buffer per file). Overrides $CAIN_BUFFER_SIZE (default 6.75)
--cassandra-data-dir string cassandra data directory. Overrides $CAIN_CASSANDRA_DATA_DIR (default "/var/lib/cassandra/data")
-u, --cassandra-username string cassandra username. Overrides $CAIN_CASSANDRA_USERNAME (default "cain")
-c, --container string container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
--dst string destination to backup to. Example: s3://bucket/cassandra. Overrides $CAIN_DST
-h, --help help for backup
-k, --keyspace string keyspace to act on. Overrides $CAIN_KEYSPACE
-n, --namespace string namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
--nodetool-credentials-file string path to nodetool credentials file. Overrides $CAIN_NODETOOL_CREDENTIALS_FILE (default "/home/cassandra/.nodetool/credentials")
-p, --parallel int number of files to copy in parallel. set this flag to 0 for full parallelism. Overrides $CAIN_PARALLEL (default 1)
-l, --selector string
Examples
Backup to AWS S3
cain backup \
-n default \
-l release=cassandra \
-k keyspace \
--dst s3://db-backup/cassandra
Backup to AWS S3 with Cassandra authentication enabled
cain backup \
-n default \
-l release=cassandra \
-k keyspace \
--dst s3://db-backup/cassandra
-a
-u cain
--nodetool-credentials-file /home/cassandra/.nodetool/credentials
Backup to Azure Blob Storage
cain backup \
-n default \
-l release=cassandra \
-k keyspace \
--dst abs://my-account/db-backup-container/cassandra
Restore Cassandra backup from cloud storage
Cain performs a restore in the following way:
- Restore schema if
schema
is specified. - Truncate all tables in
keyspace
. - Copy files from the specified
src
(underkeyspace/<keyspaceSchemaHash>/tag/
) - restore is only possible for the same keyspace schema. - Load new data using
nodetool refresh
.
Usage
$ cain restore --help
restore cassandra cluster from cloud storage
Usage:
cain restore [flags]
Flags:
-a, --authentication use authentication for nodetool and clqsh. Overrides $CAIN_AUTHENTICATION
-b, --buffer-size float in memory buffer size (MB) to use for files copy (buffer per file). Overrides $CAIN_BUFFER_SIZE (default 6.75)
--cassandra-data-dir string cassandra data directory. Overrides $CAIN_CASSANDRA_DATA_DIR (default "/var/lib/cassandra/data")
-u, --cassandra-username string cassandra username. Overrides $CAIN_CASSANDRA_USERNAME (default "cain")
-c, --container string container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
-h, --help help for restore
-k, --keyspace string keyspace to act on. Overrides $CAIN_KEYSPACE
-n, --namespace string namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
-f, --nodetool-credentials-file string path to nodetool credentials file. Overrides $CAIN_NODETOOL_CREDENTIALS_FILE (default "/home/cassandra/.nodetool/credentials")
-p, --parallel int number of files to copy in parallel. set this flag to 0 for full parallelism. Overrides $CAIN_PARALLEL (default 1)
-s, --schema string schema version to restore (optional). Overrides $CAIN_SCHEMA
-l, --selector string selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")
--src string source to restore from. Example: s3://bucket/cassandra/namespace/cluster-name. Overrides $CAIN_SRC
-t, --tag string tag to restore. Overrides $CAIN_TAG
--user-group string user and group who should own restored files. Overrides $CAIN_USER_GROUP (default "cassandra:cassandra")
Examples
Restore from S3
cain restore \
--src s3://db-backup/cassandra/default/ring01
-n default \
-k keyspace \
-l release=cassandra \
-t 20180903091624
Restore from Azure Blob Storage
cain restore \
--src s3://my-account/db-backup-container/cassandra/default/ring01
-n default \
-k keyspace \
-l release=cassandra \
-t 20180903091624
Describe keyspace schema
Cain describes the keyspace
schema using cqlsh
. It can return the schema itself, or a checksum of the schema file (used by backup
and restore
).
Usage
$ cain schema --help
get schema of cassandra cluster
Usage:
cain schema [flags]
Flags:
-c, --container string container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
-k, --keyspace string keyspace to act on. Overrides $CAIN_KEYSPACE
-n, --namespace string namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
-l, --selector string selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")
--sum print only checksum. Overrides $CAIN_SUM
Examples
cain schema \
-n default \
-l release=cassandra \
-k keyspace
cain schema \
-n default \
-l release=cassandra \
-k keyspace \
--sum
Environment variables support
Cain commands support the usage of environment variables instead of flags. For example:
The backup
command can be executed as mentioned in the example:
cain backup \
-n default \
-l release=cassandra \
-k keyspace \
--dst s3://db-backup/cassandra
You can also set the appropriate envrionment variables (CAIN_FLAG, _ instead of -):
export CAIN_NAMESPACE=default
export CAIN_SELECTOR=release=cassandra
export CAIN_KEYSPACE=keyspace
export CAIN_DST=s3://db-backup/cassandra
cain backup
Support for additional storage services
Since Cain uses Skbn, adding support for additional storage services is simple. Read this post for more information.
Skbn compatibility matrix
Cain version | Skbn version |
---|---|
0.5.2 | 0.4.2 |
0.5.1 | 0.4.2 |
0.5.0 | 0.4.1 |
0.4.2 | 0.4.1 |
0.4.1 | 0.4.1 |
0.4.0 | 0.4.0 |
0.3.0 | 0.3.0 |
0.2.0 | 0.2.0 |
0.1.0 | 0.1.1 |
Credentials
Kubernetes
Cain tries to get credentials in the following order:
- if
KUBECONFIG
environment variable is set - skbn will use the current context from that config file - if
~/.kube/config
exists - skbn will use the current context from that config file with an out-of-cluster client configuration - if
~/.kube/config
does not exist - skbn will assume it is working from inside a pod and will use an in-cluster client configuration
AWS
Skbn uses the default AWS credentials chain.
Azure Blob Storage
Skbn uses AZURE_STORAGE_ACCOUNT
and AZURE_STORAGE_ACCESS_KEY
environment variables for authentication.
Cassandra Credentials
When Authentication is enabled Cain will look for default credentials
for cqlsh
in /home/cassandra/.cassandra/credentials
if you use authentication please make sure the cassandra
container has this file and the username and password are correct.
For nodetool
authentications default credentials are in:
/home/cassandra/.nodetool/credentials
can be overridden by
setting the --nodetool-credentials-file
flag.
When this flag is used, the username for the nodetool
authentication must be provided as well .
Examples
- Helm example
- Code example