ansrivas/spark-structured-streaming: Spark structured streaming with Kafka data source...

Inside setup directory, run docker-compose up -d to launch instances of zookeeper, kafka and cassandra
Wait for a few seconds and then run docker ps to make sure all the three services are running.
Then run pip install -r requirements.txt
main.py generates some random data and publishes it to a topic in kafka.
Run the spark-app using sbt clean compile run in a console. This app will listen on topic (check Main.scala) and writes it to Cassandra.
Again run main.py to write some test data on a kafka topic.
Finally check if the data has been published in cassandra.

Credits:

spark-structured-streaming
spark-structured-streaming copied to clipboard