scylla-migrator
scylla-migrator copied to clipboard
Migrate data extract using Spark to Scylla, normally from Cassandra/parquet files. Alt. from DynamoDB to Scylla Alternator.
Building
- Make sure the Java 8 JDK and
sbtare installed on your machine. - Export the
JAVA_HOMEenvironment variable with the path to the JDK installation. - Run
build.sh.
Configuring the Migrator
Create a config.yaml for your migration using the template config.yaml.example in the repository root. Read the comments throughout carefully.
Running on a live Spark cluster
The Scylla Migrator is built against Spark 2.4.4, so you'll need to run that version on your cluster.
After running build.sh, copy the jar from ./migrator/target/scala-2.11/scylla-migrator-assembly-0.0.1.jar and the config.yaml you've created to the Spark master server.
Then, run this command on the Spark master server:
spark-submit --class com.scylladb.migrator.Migrator \
--master spark://<spark-master-hostname>:7077 \
--conf spark.scylla.config=<path to config.yaml> \
<path to scylla-migrator-assembly-0.0.1.jar>
If you pass on the truststore file or ssl related files use --files option:
spark-submit --class com.scylladb.migrator.Migrator \
--master spark://<spark-master-hostname>:7077 \
--conf spark.scylla.config=<path to config.yaml> \
--files truststorefilename \
<path to scylla-migrator-assembly-0.0.1.jar>
Running the validator
This project also includes an entrypoint for comparing the source table and the target table. You can launch it as so (after performing the previous steps):
spark-submit --class com.scylladb.migrator.Validator \
--master spark://<spark-master-hostname>:7077 \
--conf spark.scylla.config=<path to config.yaml> \
<path to scylla-migrator-assembly-0.0.1.jar>
Running locally
To run in the local Docker-based setup:
- First start the environment:
docker compose up -d
- Launch
cqlshin Cassandra's container and create a keyspace and a table with some data:
docker compose exec cassandra cqlsh
<create stuff>
- Launch
cqlshin Scylla's container and create the destination keyspace and table with the same schema as the source table:
docker compose exec scylla cqlsh
<create stuff>
-
Edit the
config.yamlfile; note the comments throughout. -
Run
build.sh. -
Then, launch
spark-submitin the master's container to run the job:
docker compose exec spark-master /spark/bin/spark-submit --class com.scylladb.migrator.Migrator \
--master spark://spark-master:7077 \
--conf spark.driver.host=spark-master \
--conf spark.scylla.config=/app/config.yaml \
/jars/scylla-migrator-assembly-0.0.1.jar
The spark-master container mounts the ./migrator/target/scala-2.11 dir on /jars and the repository root on /app. To update the jar with new code, just run build.sh and then run spark-submit again.