spark-structured-streaming
spark-structured-streaming copied to clipboard
Spark structured streaming with Kafka data source and writing to Cassandra
This is an example of structured-streaming with latest Spark v2.1.0.
A spark job reads from Kafka topic, manipulates data as datasets/dataframes and writes to Cassandra.
Usage:
-
Inside
setupdirectory, rundocker-compose up -dto launch instances ofzookeeper,kafkaandcassandra -
Wait for a few seconds and then run
docker psto make sure all the three services are running. -
Then run
pip install -r requirements.txt -
main.pygenerates some random data and publishes it to a topic in kafka. -
Run the spark-app using
sbt clean compile runin a console. This app will listen on topic (check Main.scala) and writes it to Cassandra. -
Again run
main.pyto write some test data on a kafka topic. -
Finally check if the data has been published in cassandra.
- Go to cqlsh
docker exec -it cas_01_test cqlsh localhost - And then run
select * from my_keyspace.test_table ;
- Another branch avro-example contains avro deserialization code.
Credits:
- This repository has borrowed some snippets from killrweather app.