e2e-data-engineering
e2e-data-engineering copied to clipboard
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
I am in the last steps of the project and when I do spark-submit I got cassandra module not found error. I have checked all the jars and cassandra-driver version...
Exception has occurred: AirflowConfigException Cannot use relative path: `sqlite:///C:\Users\User_Win10x64/airflow/airflow.db` to connect to sqlite. Please use absolute path such as `sqlite:////tmp/airflow.db`. File "D:\Work\data-engineer\dags\kafka-stream.py", line 2, in from airflow import DAG airflow.exceptions.AirflowConfigException:...
Whenever we try to connect to kafka, we get this error: WARNING:root:kafka dataframe could not be created because: An error occurred while calling o36.load. : java.lang.NoClassDefFoundError: scala/$less$colon$less at org.apache.spark.sql.kafka010.KafkaSourceProvider.org$apache$spark$sql$kafka010$KafkaSourceProvider$$validateStreamOptions(KafkaSourceProvider.scala:338) at...