kaizenflow icon indicating copy to clipboard operation
kaizenflow copied to clipboard

Spring2024_Data_Streaming_Platform_with_Apache_Kafka

Open heanhsok opened this issue 1 year ago • 6 comments

Description: The project involves setting up Apache Kafka and a PostgreSQL database encapsulated in a single or multiple Docker containers with docker-compose to establish a streamlined data streaming platform. Utilize Python to fetch data from an external source, format it for Kafka ingestion (Topics), and configure producers for efficient data transfer into Kafka topics. Python-based Kafka consumers will perform some EDA using Jupyter notebook, process and validate the data before storing it into PostgreSQL, utilizing a predefined schema. The goal is to create a reliable system that seamlessly downloads, processes, and securely stores external data in real-time using Kafka as the intermediary, Python for logic handling, and Docker for deployment flexibility.

LINK

heanhsok avatar Apr 11 '24 14:04 heanhsok

Hi Prof. @gpsaggese and @Shaunak01 Instead of using Jupyter notebook as my consumer, I want to use just Python script running inside a container as my consumer that reads data from Kakfa stream. The script will do some validations before ingesting it to Postgres. Is this okay?

heanhsok avatar Apr 17 '24 22:04 heanhsok

Yes seems okay to me.

On Wed, Apr 17, 2024 at 6:50 PM Heanh Sok @.***> wrote:

Hi Prof. @gpsaggese https://github.com/gpsaggese and @Shaunak01 https://github.com/Shaunak01 Instead of using Jupyter notebook as my consumer, I want to use just Python script running inside a container as my consumer that reads data from Kakfa stream. The script will do some validations before ingesting it to Postgres. Is this okay?

— Reply to this email directly, view it on GitHub https://github.com/kaizen-ai/kaizenflow/issues/779#issuecomment-2062611139, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASNPRFRQHVB6R6EL5OVTS33Y534DZAVCNFSM6AAAAABGCOTZ7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRSGYYTCMJTHE . You are receiving this because you were mentioned.Message ID: @.***>

Shaunak01 avatar Apr 18 '24 16:04 Shaunak01

Following are the progress I have made:

  • create docker-compose file that defined all the necessary containers (such as Zookeeper, Kafka Broker, Jupyter, Schema Registry, Postgres) for the project
  • create a Kafka Tutorial in Jupyter notebook
  • create producer and consumer that continuously producing and consuming stream of messages
  • write draft of project documentation which includes the following sections:
    • Overview
    • Technologies Used
    • Project Structure
    • Docker Implementation
    • How to Run
    • Kafka Tutorial
    • Implementing Data Streaming Platform
    • Cleaning Up

TODO

  • Add example of schema validation for producer and consumer using Apache Avro as data serialization
  • Add example of data validation logic in our consumer
  • Improve README by adding explanation on concepts such as Kafka Cluster, Topics, Messages, Partitioning, Replication Factor, Producer, Consumer, Consumer Group, Schema Registry
  • Add explanation to the Implementing Data Streaming Platform Section
  • Add comments to code

heanhsok avatar Apr 26 '24 20:04 heanhsok

Additional Completed Items:

  • Add example of schema validation for producer and consumer using Apache Avro as data serialization Add example of data validation logic in our consumer
  • Improve README by adding explanation on concepts such as Kafka Cluster, Topics, Messages, Partitioning, Replication Factor, Producer, Consumer, Consumer Group, Schema Registry
  • Add explanation to the Implementing Data Streaming Platform Section

TODO:

  • Add example of data validation logic in our consumer
  • Add comments to code

heanhsok avatar May 02 '24 20:05 heanhsok

Additional Completed Items:

  • Add example of data validation logic in our consumer
  • Add comments to code

TODO:

  • Record a short video explaining the project and put in the PR

heanhsok avatar May 07 '24 16:05 heanhsok

Additional Completed Items:

  • Record a short video walkthrough of the project.

Project is completed.

heanhsok avatar May 09 '24 20:05 heanhsok